The goal of this project is to predict the price of a smartphone given certain characteristics. Applying Supervised Learning techniques on an open data set obtained from kaggle.com.
Firstly, we will load the data set and initialise the libraries that we are going to use during this whole study.
library(dplyr) # selecting variables
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(mice) # handling outliers
##
## Attaching package: 'mice'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## cbind, rbind
library(ggplot2) # plots
library(forecast) # plots
## Warning: package 'forecast' was built under R version 4.3.2
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(gridExtra) # plots
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
library(grid) # plots
library(plotly) # plots
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ✔ readr 2.1.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ gridExtra::combine() masks dplyr::combine()
## ✖ plotly::filter() masks mice::filter(), dplyr::filter(), stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(MASS)
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:plotly':
##
## select
##
## The following object is masked from 'package:dplyr':
##
## select
library(caret) # machine learning
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(e1071)
library(skimr)
## Warning: package 'skimr' was built under R version 4.3.2
library(VIM)
## Loading required package: colorspace
## The legacy packages maptools, rgdal, and rgeos, underpinning the sp package,
## which was just loaded, will retire in October 2023.
## Please refer to R-spatial evolution reports for details, especially
## https://r-spatial.org/r/2023/05/15/evolution4.html.
## It may be desirable to make the sf package available;
## package maintainers should consider adding sf to Suggests:.
## The sp package is now running under evolution status 2
## (status 2 uses the sf package in place of rgdal)
## VIM is ready to use.
##
## Suggestions and bug-reports can be submitted at: https://github.com/statistikat/VIM/issues
##
## Attaching package: 'VIM'
##
## The following object is masked from 'package:datasets':
##
## sleep
library(reshape2) # melting data for plotting
##
## Attaching package: 'reshape2'
##
## The following object is masked from 'package:tidyr':
##
## smiths
library(GGally) # correlations
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Loaded glmnet 4.1-8
library(rpart)
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
##
## The following object is masked from 'package:colorspace':
##
## coords
##
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
library(class)
library(randomForest) # random forests machine learning
## Warning: package 'randomForest' was built under R version 4.3.2
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:gridExtra':
##
## combine
##
## The following object is masked from 'package:ggplot2':
##
## margin
##
## The following object is masked from 'package:dplyr':
##
## combine
library(gbm) # Gradient Boosting
## Warning: package 'gbm' was built under R version 4.3.2
## Loaded gbm 2.1.8.1
library(xgboost)
## Warning: package 'xgboost' was built under R version 4.3.2
##
## Attaching package: 'xgboost'
##
## The following object is masked from 'package:plotly':
##
## slice
##
## The following object is masked from 'package:dplyr':
##
## slice
library(glmnet) # Ridge Regression
library(leaflet)
Loading of the data set.
rm(list = ls())
data = read.csv("ndtv_data_final.csv")
glimpse(data)
## Rows: 1,359
## Columns: 22
## $ X <int> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 1…
## $ Name <chr> "OnePlus 7T Pro McLaren Edition", "Realme X2 Pr…
## $ Brand <chr> "OnePlus", "Realme", "Apple", "Apple", "LG", "O…
## $ Model <chr> "7T Pro McLaren Edition", "X2 Pro", "iPhone 11 …
## $ Battery.capacity..mAh. <int> 4085, 4000, 3969, 3110, 4000, 3800, 4085, 4300,…
## $ Screen.size..inches. <dbl> 6.67, 6.50, 6.50, 6.10, 6.40, 6.55, 6.67, 6.80,…
## $ Touchscreen <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ Resolution.x <int> 1440, 1080, 1242, 828, 1080, 1080, 1440, 1440, …
## $ Resolution.y <int> 3120, 2400, 2688, 1792, 2340, 2400, 3120, 3040,…
## $ Processor <int> 8, 8, 6, 6, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8,…
## $ RAM..MB. <int> 12000, 6000, 4000, 4000, 6000, 8000, 8000, 1200…
## $ Internal.storage..GB. <dbl> 256, 64, 64, 64, 128, 128, 256, 256, 128, 128, …
## $ Rear.camera <dbl> 48.0, 64.0, 12.0, 12.0, 12.0, 48.0, 48.0, 12.0,…
## $ Front.camera <dbl> 16, 16, 12, 12, 32, 16, 16, 10, 24, 20, 16, 16,…
## $ Operating.system <chr> "Android", "Android", "iOS", "iOS", "Android", …
## $ Wi.Fi <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ Bluetooth <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ GPS <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "No", "Yes",…
## $ Number.of.SIMs <int> 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2,…
## $ X3G <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes",…
## $ X4G..LTE <chr> "Yes", "Yes", "Yes", "Yes", "No", "Yes", "Yes",…
## $ Price <int> 58998, 27999, 106900, 62900, 49990, 34930, 5299…
set.seed(321)
The data set is configured by the following variables:
We see that there are a lot of variables, however there are certain features that will not be used as we infer that are not good predictors and make the data set noisy. Those are:
X: index -> int -> The index does not provide any information about the mobile phone’s prices.
Name: Name of the Phone -> chr -> It is a variable with long strings that adds noise and with just the Brand of the phone it is enough to study.
Model: Model of the Phone -> chr -> It is a variable with long strings that adds noise and with just the Brand of the phone it is enough to study.
# We eliminate the 3 variables
# data = data %>% select(-X, -Name, -Model)
data$X = NULL
data$Name = NULL
data$Model = NULL
Then, the variables that are Characters will be converted into categorical so they can be used properly in the Supervised Learning tools.
# Factorise mantaining the actual names
data$Brand = factor(data$Brand)
data$Operating.system = factor(data$Operating.system)
# Factorise with 1's and 0's (1 = Yes; 0 = No)
data$Touchscreen = factor(data$Touchscreen, levels = c("Yes", "No"),
labels = c(1, 0))
data$Wi.Fi = factor(data$Wi.Fi, levels = c("Yes", "No"),
labels = c(1, 0))
data$Bluetooth = factor(data$Bluetooth, levels = c("Yes", "No"),
labels = c(1, 0))
data$GPS = factor(data$GPS, levels = c("Yes", "No"),
labels = c(1, 0))
data$X3G = factor(data$X3G, levels = c("Yes", "No"),
labels = c(1, 0))
data$X4G..LTE = factor(data$X4G..LTE, levels = c("Yes", "No"),
labels = c(1, 0))
# We will also factorise the Number of SIMs as it takes 1, 2 or 3
data$Number.of.SIMs = factor(data$Number.of.SIMs, levels = c(1, 2, 3),
labels = c(1, 2, 3))
Now, we will focus on converting the prices of the smartphones from Indian Rupees to Euros. The actual conversion at this date is 1 INR = 0.011069359 €. We will round it so we stay using integers.
data$Price = round(data$Price * 0.011069359)
summary(data)
## Brand Battery.capacity..mAh. Screen.size..inches. Touchscreen
## Intex :117 Min. :1010 Min. :2.400 1:1342
## Samsung :101 1st Qu.:2300 1st Qu.:5.000 0: 17
## Micromax : 71 Median :3000 Median :5.200
## Lava : 59 Mean :2938 Mean :5.291
## Panasonic: 55 3rd Qu.:3500 3rd Qu.:5.700
## Vivo : 52 Max. :6000 Max. :7.300
## (Other) :904
## Resolution.x Resolution.y Processor RAM..MB.
## Min. : 240.0 Min. : 320 Min. : 1.000 Min. : 64
## 1st Qu.: 720.0 1st Qu.:1280 1st Qu.: 4.000 1st Qu.: 1000
## Median : 720.0 Median :1280 Median : 4.000 Median : 2000
## Mean : 811.5 Mean :1491 Mean : 5.551 Mean : 2489
## 3rd Qu.:1080.0 3rd Qu.:1920 3rd Qu.: 8.000 3rd Qu.: 3000
## Max. :2160.0 Max. :3840 Max. :10.000 Max. :12000
##
## Internal.storage..GB. Rear.camera Front.camera Operating.system
## Min. : 0.064 Min. : 0.00 Min. : 0.000 Android :1299
## 1st Qu.: 8.000 1st Qu.: 8.00 1st Qu.: 2.000 BlackBerry: 10
## Median : 16.000 Median : 12.20 Median : 5.000 Cyanogen : 10
## Mean : 30.655 Mean : 12.07 Mean : 7.038 iOS : 17
## 3rd Qu.: 32.000 3rd Qu.: 13.00 3rd Qu.: 8.000 Sailfish : 1
## Max. :512.000 Max. :108.00 Max. :48.000 Tizen : 3
## Windows : 19
## Wi.Fi Bluetooth GPS Number.of.SIMs X3G X4G..LTE Price
## 1:1351 1:1344 1:1251 1: 227 1:1214 1:1012 Min. : 5.0
## 0: 8 0: 15 0: 108 2:1131 0: 145 0: 347 1st Qu.: 53.0
## 3: 1 Median : 77.0
## Mean : 126.9
## 3rd Qu.: 133.0
## Max. :1937.0
##
Taking a look again at the data we see that apparently there are no NA values. Nevertheless, we need to bear in mind that the “NA values” in this data set are expressed as 0’s in the features Rear.camera and Front.camera. Let’s take a look at those values:
# Before anything we will use the library mice to be sure that there are no NAs
md.pattern(data, rotate.names = TRUE)
## /\ /\
## { `---' }
## { O O }
## ==> V <== No need for mice. This data set is completely observed.
## \ \|/ /
## `-----'
## Brand Battery.capacity..mAh. Screen.size..inches. Touchscreen Resolution.x
## 1359 1 1 1 1 1
## 0 0 0 0 0
## Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## 1359 1 1 1 1 1
## 0 0 0 0 0
## Front.camera Operating.system Wi.Fi Bluetooth GPS Number.of.SIMs X3G
## 1359 1 1 1 1 1 1 1
## 0 0 0 0 0 0 0
## X4G..LTE Price
## 1359 1 1 0
## 0 0 0
# No NAs explicitly
# Amount of 0's
sum(data$Rear.camera == 0) # 2 with no "normal" camera
## [1] 2
sum(data$Front.camera == 0) # 18 with no "selfie" camera
## [1] 18
# We see a small amount of 0 values
# Let's check those phones
data[which(data$Rear.camera == 0), ]
data[which(data$Front.camera == 0), ]
We see that the phones with no rear camera are low-budget phones that by their specifications, we deduce their target audience is people who just want to call. About the phones with no front camera we see the same tendency, low-budget phones with the same target. So keeping those observations would be a solid idea as those 0’s make sense. However, there are 2 smartphones in which we deduce there are NAs. Smartphone 429 (an Oppo of 111€) and smartphone 645 (a Samsung of 398€). One idea would be to eliminate both smartphones. Nonetheless, we see that the Samsung is actually good model in terms of specifications and by comparing it within similar price tags phones of the same company. The usual relationship between the rear and the front cameras is of 2/3. Hence, for this model we will maintain that value as a front camera. About the Oppo model we see that whenever there is rear camera around 13 the other is 8. So we will set that value as 8 and continue with the study.
data[which(data$Brand == "Samsung" & data$Price > 300 & data$Price < 500), ] # comparison from where we deduce the Rear - Front camera Ratio
data[645, ]$Front.camera = round(data[645, ]$Rear.camera * (2 / 3))
# For the Oppo we do a similar analysis
data[which(data$Brand == "Oppo" & data$Price > 90 & data$Price < 130), ]
data[429, ]$Front.camera = 8
summary(data)
## Brand Battery.capacity..mAh. Screen.size..inches. Touchscreen
## Intex :117 Min. :1010 Min. :2.400 1:1342
## Samsung :101 1st Qu.:2300 1st Qu.:5.000 0: 17
## Micromax : 71 Median :3000 Median :5.200
## Lava : 59 Mean :2938 Mean :5.291
## Panasonic: 55 3rd Qu.:3500 3rd Qu.:5.700
## Vivo : 52 Max. :6000 Max. :7.300
## (Other) :904
## Resolution.x Resolution.y Processor RAM..MB.
## Min. : 240.0 Min. : 320 Min. : 1.000 Min. : 64
## 1st Qu.: 720.0 1st Qu.:1280 1st Qu.: 4.000 1st Qu.: 1000
## Median : 720.0 Median :1280 Median : 4.000 Median : 2000
## Mean : 811.5 Mean :1491 Mean : 5.551 Mean : 2489
## 3rd Qu.:1080.0 3rd Qu.:1920 3rd Qu.: 8.000 3rd Qu.: 3000
## Max. :2160.0 Max. :3840 Max. :10.000 Max. :12000
##
## Internal.storage..GB. Rear.camera Front.camera Operating.system
## Min. : 0.064 Min. : 0.00 Min. : 0.000 Android :1299
## 1st Qu.: 8.000 1st Qu.: 8.00 1st Qu.: 2.000 BlackBerry: 10
## Median : 16.000 Median : 12.20 Median : 5.000 Cyanogen : 10
## Mean : 30.655 Mean : 12.07 Mean : 7.067 iOS : 17
## 3rd Qu.: 32.000 3rd Qu.: 13.00 3rd Qu.: 8.000 Sailfish : 1
## Max. :512.000 Max. :108.00 Max. :48.000 Tizen : 3
## Windows : 19
## Wi.Fi Bluetooth GPS Number.of.SIMs X3G X4G..LTE Price
## 1:1351 1:1344 1:1251 1: 227 1:1214 1:1012 Min. : 5.0
## 0: 8 0: 15 0: 108 2:1131 0: 145 0: 347 1st Qu.: 53.0
## 3: 1 Median : 77.0
## Mean : 126.9
## 3rd Qu.: 133.0
## Max. :1937.0
##
Now, let’s step into the handling of outliers. First of all, before taking any outlier conclusion we must consider the nature of the topic and the data set studied. Mobile phones can vary a lot and those changes between devices are crucial. They define almost perfectly the target audience of buyers for certain devices, some companies may be more focused on a low-budget audience, others on reliability and power efficiency, others on high-end devices… Also, depending on the company market cap and their strategy, they may have a wider variety of products than others.
In short, this part of the feature engineering must be taken as just an idea of how the data set is distributed. Therefore, the 3-sigma rule, the IQR and the distribution plots will be used for that purpose.
# 3-sigma rule and IQR (numerical variables only)
# Battery - (Just 3 "outliers")
mu = mean(data$Battery.capacity..mAh.)
sigma = sd(data$Battery.capacity..mAh.)
sum(data$Battery.capacity..mAh. < mu - 3 * sigma |
data$Battery.capacity..mAh. > mu + 3 * sigma)
## [1] 3
QI = quantile(data$Battery.capacity..mAh., 0.25)
QS = quantile(data$Battery.capacity..mAh., 0.75)
IQR = QS - QI
sum(data$Battery.capacity..mAh. < QI - 1.5*IQR |
data$Battery.capacity..mAh. > QS + 1.5*IQR)
## [1] 3
# Screen - (11 from 3-sigma and 22 from IQR)
mu = mean(data$Screen.size..inches.)
sigma = sd(data$Screen.size..inches.)
sum(data$Screen.size..inches. < mu - 3 * sigma |
data$Screen.size..inches. > mu + 3 * sigma)
## [1] 11
QI = quantile(data$Screen.size..inches., 0.25)
QS = quantile(data$Screen.size..inches., 0.75)
IQR = QS - QI
sum(data$Screen.size..inches. < QI - 1.5*IQR |
data$Screen.size..inches. > QS + 1.5*IQR)
## [1] 22
# Resolution X - (Just 3 "outliers")
mu = mean(data$Resolution.x)
sigma = sd(data$Resolution.x)
sum(data$Resolution.x < mu - 3 * sigma |
data$Resolution.x > mu + 3 * sigma)
## [1] 3
QI = quantile(data$Resolution.x, 0.25)
QS = quantile(data$Resolution.x, 0.75)
IQR = QS - QI
sum(data$Resolution.x < QI - 1.5*IQR |
data$Resolution.x > QS + 1.5*IQR)
## [1] 3
# Resolution Y - (5 from 3-sigma and 21 from IQR)
mu = mean(data$Resolution.y)
sigma = sd(data$Resolution.y)
sum(data$Resolution.y < mu - 3 * sigma |
data$Resolution.y > mu + 3 * sigma)
## [1] 5
QI = quantile(data$Resolution.y, 0.25)
QS = quantile(data$Resolution.y, 0.75)
IQR = QS - QI
sum(data$Resolution.y < QI - 1.5*IQR |
data$Resolution.y > QS + 1.5*IQR)
## [1] 21
# Processor (No outliers)
mu = mean(data$Processor)
sigma = sd(data$Processor)
sum(data$Processor < mu - 3 * sigma |
data$Processor > mu + 3 * sigma)
## [1] 0
QI = quantile(data$Processor, 0.25)
QS = quantile(data$Processor, 0.75)
IQR = QS - QI
sum(data$Processor < QI - 1.5*IQR |
data$Processor > QS + 1.5*IQR)
## [1] 0
# RAM - (33 outliers)
mu = mean(data$RAM..MB.)
sigma = sd(data$RAM..MB.)
sum(data$RAM..MB. < mu - 3 * sigma |
data$RAM..MB. > mu + 3 * sigma)
## [1] 33
QI = quantile(data$RAM..MB., 0.25)
QS = quantile(data$RAM..MB., 0.75)
IQR = QS - QI
sum(data$RAM..MB. < QI - 1.5*IQR |
data$RAM..MB. > QS + 1.5*IQR)
## [1] 33
# Internal Storage - (10 from 3-sigma and 79 from IQR)
mu = mean(data$Internal.storage..GB.)
sigma = sd(data$Internal.storage..GB.)
sum(data$Internal.storage..GB. < mu - 3 * sigma |
data$Internal.storage..GB. > mu + 3 * sigma)
## [1] 10
QI = quantile(data$Internal.storage..GB., 0.25)
QS = quantile(data$Internal.storage..GB., 0.75)
IQR = QS - QI
sum(data$Internal.storage..GB. < QI - 1.5*IQR |
data$Internal.storage..GB. > QS + 1.5*IQR)
## [1] 79
# Rear Camera - (51 from 3-sigma and 91 from IQR)
mu = mean(data$Rear.camera)
sigma = sd(data$Rear.camera)
sum(data$Rear.camera < mu - 3 * sigma |
data$Rear.camera > mu + 3 * sigma)
## [1] 51
QI = quantile(data$Rear.camera, 0.25)
QS = quantile(data$Rear.camera, 0.75)
IQR = QS - QI
sum(data$Rear.camera < QI - 1.5*IQR |
data$Rear.camera > QS + 1.5*IQR)
## [1] 91
# Front camera - (26 from 3-sigma and 80 from IQR)
mu = mean(data$Front.camera)
sigma = sd(data$Front.camera)
sum(data$Front.camera < mu - 3 * sigma |
data$Front.camera > mu + 3 * sigma)
## [1] 26
QI = quantile(data$Front.camera, 0.25)
QS = quantile(data$Front.camera, 0.75)
IQR = QS - QI
sum(data$Front.camera < QI - 1.5*IQR |
data$Front.camera > QS + 1.5*IQR)
## [1] 80
# Price - (35 from 3-sigma and 142 from IQR)
mu = mean(data$Price)
sigma = sd(data$Price)
sum(data$Price < mu - 3 * sigma |
data$Price > mu + 3 * sigma)
## [1] 35
QI = quantile(data$Price, 0.25)
QS = quantile(data$Price, 0.75)
IQR = QS - QI
sum(data$Price < QI - 1.5*IQR |
data$Price > QS + 1.5*IQR)
## [1] 142
After this first glance of the outliers, we see that the RAM, internal memory, cameras and Price have a considerable amount of them.
# Seeing graphically the different distributions of the Numerical variables
data_numerical = data %>% dplyr::select(where(is.numeric))
# We normalise the numerical data to make comparisons right
data_numerical = scale(data_numerical)
# Melt the data for plotting
melted_data = melt(data_numerical)
g1 = ggplot(melted_data, aes(x = Var2, y = value, fill = Var2)) +
geom_boxplot() +
scale_fill_manual(values = rainbow(11)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
g1
Despite the amount of outliers due to the nature of this topic, actually the outliers are important data. So, before going to the Visualization part, we will normalise the numerical variables.
# For future individual plots we will save a not normalised copy
data_old = data
data$Battery.capacity..mAh. = (data$Battery.capacity..mAh. -
min(data$Battery.capacity..mAh.)) /
(max(data$Battery.capacity..mAh.) -
min(data$Battery.capacity..mAh.))
data$Screen.size..inches. = (data$Screen.size..inches. -
min(data$Screen.size..inches.)) /
(max(data$Screen.size..inches.) -
min(data$Screen.size..inches.))
data$Resolution.x = (data$Resolution.x - min(data$Resolution.x)) /
(max(data$Resolution.x) - min(data$Resolution.x))
data$Resolution.y = (data$Resolution.y - min(data$Resolution.y)) /
(max(data$Resolution.y) - min(data$Resolution.y))
data$Processor = (data$Processor - min(data$Processor)) /
(max(data$Processor) - min(data$Processor))
data$RAM..MB. = (data$RAM..MB. - min(data$RAM..MB.)) /
(max(data$RAM..MB.) - min(data$RAM..MB.))
data$Internal.storage..GB. = (data$Internal.storage..GB. -
min(data$Internal.storage..GB.)) /
(max(data$Internal.storage..GB.) -
min(data$Internal.storage..GB.))
data$Rear.camera = (data$Rear.camera - min(data$Rear.camera)) /
(max(data$Rear.camera) - min(data$Rear.camera))
data$Front.camera = (data$Front.camera - min(data$Front.camera)) /
(max(data$Front.camera) - min(data$Front.camera))
#data$Price = (data$Price - min(data$Price)) / (max(data$Price) - min(data$Price))
glimpse(data)
## Rows: 1,359
## Columns: 19
## $ Brand <fct> OnePlus, Realme, Apple, Apple, LG, OnePlus, One…
## $ Battery.capacity..mAh. <dbl> 0.6162325, 0.5991984, 0.5929860, 0.4208417, 0.5…
## $ Screen.size..inches. <dbl> 0.8714286, 0.8367347, 0.8367347, 0.7551020, 0.8…
## $ Touchscreen <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Resolution.x <dbl> 0.6250000, 0.4375000, 0.5218750, 0.3062500, 0.4…
## $ Resolution.y <dbl> 0.7954545, 0.5909091, 0.6727273, 0.4181818, 0.5…
## $ Processor <dbl> 0.7777778, 0.7777778, 0.5555556, 0.5555556, 0.7…
## $ RAM..MB. <dbl> 1.0000000, 0.4973190, 0.3297587, 0.3297587, 0.4…
## $ Internal.storage..GB. <dbl> 0.4999375, 0.1248906, 0.1248906, 0.1248906, 0.2…
## $ Rear.camera <dbl> 0.4444444, 0.5925926, 0.1111111, 0.1111111, 0.1…
## $ Front.camera <dbl> 0.3333333, 0.3333333, 0.2500000, 0.2500000, 0.6…
## $ Operating.system <fct> Android, Android, iOS, iOS, Android, Android, A…
## $ Wi.Fi <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Bluetooth <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ GPS <fct> 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ Number.of.SIMs <fct> 2, 2, 2, 2, 1, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2,…
## $ X3G <fct> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,…
## $ X4G..LTE <fct> 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1,…
## $ Price <dbl> 653, 310, 1183, 696, 553, 387, 587, 882, 421, 2…
summary(data)
## Brand Battery.capacity..mAh. Screen.size..inches. Touchscreen
## Intex :117 Min. :0.0000 Min. :0.0000 1:1342
## Samsung :101 1st Qu.:0.2585 1st Qu.:0.5306 0: 17
## Micromax : 71 Median :0.3988 Median :0.5714
## Lava : 59 Mean :0.3865 Mean :0.5901
## Panasonic: 55 3rd Qu.:0.4990 3rd Qu.:0.6735
## Vivo : 52 Max. :1.0000 Max. :1.0000
## (Other) :904
## Resolution.x Resolution.y Processor RAM..MB.
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.2500 1st Qu.:0.2727 1st Qu.:0.3333 1st Qu.:0.07842
## Median :0.2500 Median :0.2727 Median :0.3333 Median :0.16220
## Mean :0.2977 Mean :0.3326 Mean :0.5057 Mean :0.20315
## 3rd Qu.:0.4375 3rd Qu.:0.4545 3rd Qu.:0.7778 3rd Qu.:0.24598
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.00000
##
## Internal.storage..GB. Rear.camera Front.camera Operating.system
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Android :1299
## 1st Qu.:0.01550 1st Qu.:0.07407 1st Qu.:0.04167 BlackBerry: 10
## Median :0.03113 Median :0.11296 Median :0.10417 Cyanogen : 10
## Mean :0.05976 Mean :0.11176 Mean :0.14724 iOS : 17
## 3rd Qu.:0.06238 3rd Qu.:0.12037 3rd Qu.:0.16667 Sailfish : 1
## Max. :1.00000 Max. :1.00000 Max. :1.00000 Tizen : 3
## Windows : 19
## Wi.Fi Bluetooth GPS Number.of.SIMs X3G X4G..LTE Price
## 1:1351 1:1344 1:1251 1: 227 1:1214 1:1012 Min. : 5.0
## 0: 8 0: 15 0: 108 2:1131 0: 145 0: 347 1st Qu.: 53.0
## 3: 1 Median : 77.0
## Mean : 126.9
## 3rd Qu.: 133.0
## Max. :1937.0
##
So in the end, the data set that we will be working from now on is defined by the following features:
Brand: Brand Name -> fct with 76 levels of the different brands such as OnePlus, Xiaomi, Apple…
Battery capacity (mAh): Battery capacity in mAh -> dbl (normalised) int (non-normalised).
Screen size (inches): Screen Size in Inches across opposite corners -> dbl
Touchscreen: Whether the phone is touchscreen supported or not -> fct (1 = it has; 0 = it has NOT).
Resolution x: The resolution of the phone along the width of the screen -> dbl (normalised) int (non-normalised).
Resolution y: The resolution of the phone along the height of the screen -> dbl (normalised) int (non-normalised).
Processor: No. of processor cores -> dbl (normalised) int (non-normalised).
RAM (MB): RAM available in phone in MB -> dbl (normalised) int (non-normalised).
Internal storage: Internal Storage of phone in GB -> dbl (normalised) int (non-normalised).
Rear camera: Resolution of rear camera in MP (0 if unavailable) -> dbl
Front camera: Resolution of front camera in MP (0 if unavailable) -> dbl
Operation system: OS used in phone -> fct with 7 levels of the different OS such as Android, iOS…
Wi-Fi: Whether phone has WiFi functionality -> fct (1 = it has; 0 = it has NOT).
Bluetooth: Whether phone has Bluetooth functionality -> fct (1 = it has; 0 = it has NOT).
GPS: Whether phone has GPS functionality -> fct (1 = it has; 0 = it has NOT).
Number of SIMs: Number of SIM card slots in phone -> fct with 3 levels (1, 2 or 3 SIMs).
3G: Whether phone has 3G network functionality -> fct (1 = it has; 0 = it has NOT).
4G/LTE: Whether phone has 4G/LTE network functionality -> fct (1 = it has; 0 = it has NOT).
Price: Price of the phone in € -> dbl (normalised) int (non-normalised).
In this part of the practice we will make an visual analysis of the data in order to get a better understanding of the behaviour of it and the features. We will follow the next procedure:
Amount of devices per Brand:
g2 = ggplot(data, aes(x = fct_infreq(Brand), fill = fct_infreq(Brand))) +
geom_bar() +
scale_fill_viridis_d(name = "Colours", direction = -1) + # Use a color gradient
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) +
labs(width = 15, height = 10) +
labs(title = "Amount of devices per Brand", x = "Brands", y = "Amount")
g2
The bast majority of cell phones are built by Intex, Samsung and Micromax.
Percentages of Touchscreen, WiFi, Bluetooth, GPS, 3G and 4G devices:
# TOUCH SCREEN
# Count occurrences of each category in the 'Touchscreen' column
touchscreen_counts = table(data$Touchscreen)
# Calculate percentages
touchscreen_percentages = round(prop.table(touchscreen_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
touchscreen_df = data.frame(
Touchscreen = names(touchscreen_counts),
Count = as.numeric(touchscreen_counts),
Percentage = touchscreen_percentages
)
# Create a pie chart with percentages using ggplot2
gPie1 = ggplot(touchscreen_df, aes(x = "", y = Count, fill = Touchscreen)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "TOUCHSCREEN") +
ggtitle("Distribution of Touchscreen") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# WIFI
# Count occurrences of each category in the 'Wi.Fi' column
wifi_counts <- table(data$Wi.Fi)
# Calculate percentages
wifi_percentages <- round(prop.table(wifi_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
wifi_df <- data.frame(
Wi.Fi = names(wifi_counts),
Count = as.numeric(wifi_counts),
Percentage = wifi_percentages
)
# Create a pie chart with percentages using ggplot2 for Wi.Fi
gPie2 = ggplot(wifi_df, aes(x = "", y = Count, fill = Wi.Fi)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "Wi.Fi") +
ggtitle("Distribution of WiFi") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# BLUETOOTH
# Count occurrences of each category in the 'Bluetooth' column
bluetooth_counts = table(data$Bluetooth)
# Calculate percentages
bluetooth_percentages = round(prop.table(bluetooth_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
bluetooth_df = data.frame(
Bluetooth = names(bluetooth_counts),
Count = as.numeric(bluetooth_counts),
Percentage = bluetooth_percentages
)
# Create a pie chart with percentages using ggplot2 for Bluetooth
gPie3 = ggplot(bluetooth_df, aes(x = "", y = Count, fill = Bluetooth)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "Bluetooth") +
ggtitle("Distribution of Bluetooth") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# GPS
# Count occurrences of each category in the 'GPS' column
gps_counts = table(data$GPS)
# Calculate percentages
gps_percentages = round(prop.table(gps_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
gps_df = data.frame(
GPS = names(gps_counts),
Count = as.numeric(gps_counts),
Percentage = gps_percentages
)
# Create a pie chart with percentages using ggplot2 for GPS
gPie4 = ggplot(gps_df, aes(x = "", y = Count, fill = GPS)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "GPS") +
ggtitle("Distribution of GPS") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# 3G
# Count occurrences of each category in the 'X3G' column
x3g_counts = table(data$X3G)
# Calculate percentages
x3g_percentages = round(prop.table(x3g_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
x3g_df = data.frame(
X3G = names(x3g_counts),
Count = as.numeric(x3g_counts),
Percentage = x3g_percentages
)
# Create a pie chart with percentages using ggplot2 for X3G
gPie5 = ggplot(x3g_df, aes(x = "", y = Count, fill = X3G)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "X3G") +
ggtitle("Distribution of 3G") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# 4G LTE
# Count occurrences of each category in the 'X4G..LTE' column
x4g_lte_counts = table(data$X4G..LTE)
# Calculate percentages
x4g_lte_percentages = round(prop.table(x4g_lte_counts) * 100, 2)
# Create a data frame with the counts and percentages for plotting
x4g_lte_df = data.frame(
X4G_LTE = names(x4g_lte_counts),
Count = as.numeric(x4g_lte_counts),
Percentage = x4g_lte_percentages
)
# Create a pie chart with percentages using ggplot2 for X4G..LTE
gPie6 = ggplot(x4g_lte_df, aes(x = "", y = Count, fill = X4G_LTE)) +
geom_bar(stat = "identity", width = 1) +
coord_polar("y", start = 0) +
geom_text(aes(label = paste0(Percentage.Freq, "%")),
position = position_stack(vjust = 0.5),
size = 5,
show.legend = FALSE) +
labs(fill = "X4G..LTE") +
ggtitle("Distribution of 4G LTE") +
theme_void() +
scale_fill_manual(values = c("hotpink1", "mediumspringgreen"))
# All the Pie charts together
g3 = grid.arrange(gPie1, gPie2, gPie3, gPie4, gPie5, gPie6, ncol = 3)
We see that the amount of devices without either a touchscreen, WiFi or Bluetooth are negligible. In terms of phones without GPS or 3G capabilities are around 10% which explains a market cap in which the target audience are users that just want phones to call. However the 25% of phones without 4G LTE connection suggests that probably cheap phones do not have that capability. This will be studied in the next plot.
4G LTE vs 3G vs GPS in terms of Price:
# Create a grouped bar plot
g4 = ggplot(data, aes(x = GPS, y = Price, fill = factor(X3G))) +
geom_bar(stat = "identity", position = "dodge", color = "black", alpha = 0.8) +
labs(x = "GPS Capability", y = "Price", fill = "X3G Capability",
title = "Price vs. GPS and X3G Capabilities") +
scale_fill_discrete(name = "X3G Capability", labels = c("No", "Yes")) +
theme_minimal()
g4
g5 = ggplot(data, aes(x = GPS, y = Price, fill = X4G..LTE)) +
geom_bar(stat = "identity", position = "dodge", color = "black", alpha = 0.8) +
labs(x = "GPS Capability", y = "Price", fill = "4G LTE Capability",
title = "Price vs. GPS and 4G LTE Capabilities") +
scale_fill_discrete(name = "4G LTE Capability", labels = c("No", "Yes")) +
theme_minimal()
g5
g6 = ggplot(data, aes(x = GPS, y = Price, fill = X3G)) +
geom_boxplot() + facet_grid(data$X4G..LTE) +
labs(title = "Price vs. 4G LTE, GPS and X3G", x = "GPS", y = "Price") +
ylim(c(0,750))
g6
## Warning: Removed 19 rows containing non-finite values (`stat_boxplot()`).
We see that low-budget phones tend to be overall devices without GPS and 3G capable. We also find that there are a significant amount of devices without 3G while having 4G LTE.
Operating Systems:
OS and Prices:
g7 = ggplot(data_old) + aes(x = Operating.system, y = Price,
fill = Operating.system) +
geom_boxplot() + theme(legend.position = "none") +
labs(title = "Operating System and their Prices", x = "OS", y = "Price")
g7
From the plot it can be deduced that iOS devices are the most expensive, while Cyanogen, Sailfish and Tizen not. Another thing to mention is the amount of outliers in the Android devices, they tend to be lower than 300€ however there are a considerable number of high-end phones.
OS devices:
g8 = ggplot(data, aes(x = fct_infreq(Operating.system))) +
geom_bar(fill = "skyblue", color = "black") +
scale_fill_viridis_d(direction = -1) +
labs(x = "Operating System", y = "Count", title = "OS count")
g8
The bast majority of devices are Android and surprisingly there are more Windows phones than iOS devices.
Numerical Features:
Battery Capacity:
g9 = ggplot(data_old, aes(x = Battery.capacity..mAh.)) +
geom_density(fill = "royalblue", color = "skyblue", alpha = 0.75) +
labs(x = "Battery capacity (mAh)", y = "Density", title = "Distribution of Battery Capacity")
g9
From the distribution we deduce that there are three main smartphones categories. The majority of the cellphones are using a between a 2000 and 3000 mAh; the high-end phones around 4000 mAh; and the ones just dedicated for extreme duration around 5000 mAh.
Cameras:
g10 = ggplot(data_old) + aes(x = Rear.camera, y = Front.camera) +
geom_count(color ="lightslateblue") + geom_smooth(method = "lm") +
labs(title = "Rear vs Front camera", x = "Rear camera", y = "Front camera")
g10
## `geom_smooth()` using formula = 'y ~ x'
The majority of the phones have lower-end cameras. Also, most of the phones tend to have better rear camera than the front. Nonetheless, in the 20 to 40 MP range of the front camera the most common behaviour is that the front camera is better than the rear camera.
g11 = ggplot(data_old)+aes(x = Rear.camera, y = Front.camera) +
geom_point(aes(color = Price)) + scale_color_continuous(trans = "log")
g11
When studying the camera quality and the price we do not get any surprises, the lower the camera quality the lower the price.
Screen’s Resolution:
g12 = plot_ly(data = data_old, x = ~Resolution.x, y = ~Resolution.y,
type = "scatter", mode = "markers") %>%
layout(title = "Screen's Resolution", xaxis = list(title = "X resolution"),
yaxis = list(title = "Y resolution"))
g12
Phones with screens seem to have 3 main resolutions 480p (480 x 800), 720p (720 x 1280) and 1080p plus (1080 x 2200). (Note that the x and y resolutions are in a vertical orientation as we are studying smartphones not televisions or any other kind of horizontal monitors).
Screen Size vs Price by Most popular Brand:
ggplot(data_old %>% filter(Brand %in% c("Intex", "Samsung", "Micromax", "Lava",
"Panasonic", "Vivo", "Xiaomi", "Apple"))) +
aes(x = Screen.size..inches., y = Brand[Brand %in% c("Intex", "Samsung",
"Micromax", "Lava",
"Panasonic", "Vivo", "Xiaomi", "Apple")]) +
geom_violin(alpha = 0.3, fill = "skyblue") + # Adjusted fill color
geom_jitter(aes(color = Price)) +
scale_color_viridis_c(trans = "log", direction = -1) + # Reversed color scale orientation
labs(title = "Screen Size vs Price by Brands", y = "Brands") + coord_flip()
We selected the brands with the highest amount of smartphones and Apple to have a comparison in prices. We saw that smartphones with bigger screens are the most expensive no matter the brand. But that is just within devices of a specific brand. That means, if you take, for instance, a smartphone from Lava the smallest device will be the cheapest Lava smartphone. In overall terms, from 5.5 inches you cannot tell the difference of price just by the screen size. But you can if you consider the brand.
In this section of the project, we delve into the implementation and evaluation of several supervised learning techniques. As we want to predict the price of a phone given its characteristics, we will divide in two major groups to classify:
Expensive: cell phones that cost more than 100€.
Cheap: mobile phones that cost less or equal than 100€.
Before doing any classification supervised learning technique, we must interpret correlations to focus on prediction.
gCor1 = ggplot() + aes(x = cor(data_numerical)["Price",],
y = reorder(names(cor(data_numerical)["Price",]),
cor(data_numerical)["Price",])) +
geom_col(fill = "mediumorchid1") + labs(title = "Correlations",
x = "Correlation",y = "Variables") + theme_bw()
gCor1
gCor2 = ggcorr(data_numerical, label = TRUE)
gCor2
Now we create the groups for price, son we create a new variable named PriceClass:
data_old$PriceClass = factor(ifelse(data_old$Price < 100, "Cheap", "Expensive"))
levels(data_old$PriceClass)
## [1] "Cheap" "Expensive"
count <- table(data_old$PriceClass)
percentages <- prop.table(count) * 100
percentages
##
## Cheap Expensive
## 62.25166 37.74834
data_classification <- data_old
data_classification$Price = NULL
# We need to remove the Brand name too, since it is not useful for the classification
data_classification$Brand = NULL
Now we divide in test and training sets.
spl = createDataPartition(data_classification$PriceClass, p = 0.8, list = FALSE)
PhonesTrain = data_classification[spl,]
PhonesTest = data_classification[-spl,]
t = table(PhonesTrain$PriceClass)
prop.table(t)
##
## Cheap Expensive
## 0.6222426 0.3777574
We can see that the data set is more or less balanced, where the cheap phones are the 62% and the expensive ones the 38%.
table(PhonesTrain$PriceClass, PhonesTrain$Internal.storage..GB.)
##
## 0.16 0.512 1 2 3 4 8 16 32 64 128 256 512
## Cheap 1 8 2 1 1 44 233 259 111 17 0 0 0
## Expensive 0 0 0 0 0 5 23 83 115 125 53 6 1
ggplot(PhonesTrain, aes(x=PriceClass, fill = as.factor(Internal.storage..GB.))) + geom_bar()
ggplot(PhonesTrain, aes(x= as.factor(Internal.storage..GB.),fill = PriceClass)) + geom_bar()
The phones with highest Internal Storage belong to the Expensive group. The inverse happens with the lowest Storage values. However, phones with 16 and 32 GB storages are very distributed between the 2 groups.
Because we have binary classification, we can use the standard glm function in R:
logit.model <- glm(PriceClass ~ ., family=binomial(link='logit'), data=PhonesTrain)
summary(logit.model)
##
## Call:
## glm(formula = PriceClass ~ ., family = binomial(link = "logit"),
## data = PhonesTrain)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.159e+00 1.245e+00 -4.143 3.43e-05 ***
## Battery.capacity..mAh. -1.328e-04 1.583e-04 -0.839 0.401588
## Screen.size..inches. 1.409e-01 2.794e-01 0.504 0.614182
## Touchscreen0 -9.129e-02 1.134e+00 -0.080 0.935853
## Resolution.x 2.669e-03 1.200e-03 2.223 0.026190 *
## Resolution.y -1.111e-04 6.892e-04 -0.161 0.871935
## Processor 2.621e-02 5.697e-02 0.460 0.645402
## RAM..MB. 3.781e-04 1.803e-04 2.097 0.035963 *
## Internal.storage..GB. 4.210e-02 1.012e-02 4.161 3.18e-05 ***
## Rear.camera 8.740e-02 3.216e-02 2.717 0.006580 **
## Front.camera -2.196e-02 2.529e-02 -0.868 0.385146
## Operating.systemBlackBerry 2.007e+00 1.006e+00 1.994 0.046151 *
## Operating.systemCyanogen -5.328e-01 1.317e+00 -0.404 0.685862
## Operating.systemiOS 2.651e+00 1.137e+00 2.332 0.019722 *
## Operating.systemSailfish -1.266e+01 8.827e+02 -0.014 0.988561
## Operating.systemTizen -1.113e+01 6.240e+02 -0.018 0.985765
## Operating.systemWindows 9.766e-01 6.464e-01 1.511 0.130844
## Wi.Fi0 7.113e-01 1.496e+00 0.476 0.634418
## Bluetooth0 -1.439e+00 1.184e+00 -1.216 0.224053
## GPS0 -1.637e+00 4.745e-01 -3.450 0.000561 ***
## Number.of.SIMs2 -1.093e+00 2.749e-01 -3.975 7.04e-05 ***
## Number.of.SIMs3 -1.200e+01 8.827e+02 -0.014 0.989154
## X3G0 -1.627e+00 4.232e-01 -3.845 0.000121 ***
## X4G..LTE0 9.475e-01 2.840e-01 3.336 0.000850 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1442.59 on 1087 degrees of freedom
## Residual deviance: 798.58 on 1064 degrees of freedom
## AIC: 846.58
##
## Number of Fisher Scoring iterations: 13
probability <- predict(logit.model,newdata=PhonesTest, type='response')
head(probability)
## 7 8 9 12 17 18
## 0.9999998 0.9999992 0.9999491 0.9995674 0.9999932 0.9936506
prediction <- as.factor(ifelse(probability > 0.5,"Expensive","Cheap"))
head(prediction)
## 7 8 9 12 17 18
## Expensive Expensive Expensive Expensive Expensive Expensive
## Levels: Cheap Expensive
The confusion matrix is:
conf_log_reg = confusionMatrix(prediction, PhonesTest$PriceClass)
conf_log_reg
## Confusion Matrix and Statistics
##
## Reference
## Prediction Cheap Expensive
## Cheap 148 29
## Expensive 21 73
##
## Accuracy : 0.8155
## 95% CI : (0.7641, 0.8598)
## No Information Rate : 0.6236
## P-Value [Acc > NIR] : 5.346e-12
##
## Kappa : 0.6008
##
## Mcnemar's Test P-Value : 0.3222
##
## Sensitivity : 0.8757
## Specificity : 0.7157
## Pos Pred Value : 0.8362
## Neg Pred Value : 0.7766
## Prevalence : 0.6236
## Detection Rate : 0.5461
## Detection Prevalence : 0.6531
## Balanced Accuracy : 0.7957
##
## 'Positive' Class : Cheap
##
We can see that the accuracy obtained is pretty good (0.82).
Even though the dimension of the data set is not very high, we are going to try the penalized version:
p.logit.model <- glmnet(as.matrix(PhonesTrain[,-1]),PhonesTrain$PriceClass, family=c("binomial"), alpha=0, lambda=0.01)
## Warning in storage.mode(xd) <- "double": NAs introducidos por coerción
probability <- predict(p.logit.model,as.matrix(PhonesTrain[,-1]), type='response')
## Warning in cbind2(1, newx) %*% nbeta: NAs introducidos por coerción
prediction <- as.factor(ifelse(probability > 0.5,"Expensive","Cheap"))
conf_p_log_reg = confusionMatrix(prediction, PhonesTrain$PriceClass)
conf_p_log_reg
## Confusion Matrix and Statistics
##
## Reference
## Prediction Cheap Expensive
## Cheap 612 125
## Expensive 65 286
##
## Accuracy : 0.8254
## 95% CI : (0.8015, 0.8475)
## No Information Rate : 0.6222
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 0.6176
##
## Mcnemar's Test P-Value : 1.866e-05
##
## Sensitivity : 0.9040
## Specificity : 0.6959
## Pos Pred Value : 0.8304
## Neg Pred Value : 0.8148
## Prevalence : 0.6222
## Detection Rate : 0.5625
## Detection Prevalence : 0.6774
## Balanced Accuracy : 0.7999
##
## 'Positive' Class : Cheap
##
An accuracy of 0.83 was obtained, no appreciable improvement.
ROC curve shows true positives vs false positives in relation with different thresholds:
model <- lda(PriceClass ~ ., data=PhonesTrain, prior = c(.9, .1))
probability = predict(model, PhonesTest)$posterior
roc.lda <- roc(PhonesTest$PriceClass,probability[,2])
## Setting levels: control = Cheap, case = Expensive
## Setting direction: controls < cases
auc(roc.lda)
## Area under the curve: 0.9008
plot.roc(PhonesTest$PriceClass, probability[,2],col="darkblue", print.auc = TRUE, auc.polygon=TRUE, grid=c(0.1, 0.2),
grid.col=c("green", "red"), max.auc.polygon=TRUE,auc.polygon.col="lightblue", print.thres=TRUE, legacy.axes = TRUE)
## Setting levels: control = Cheap, case = Expensive
## Setting direction: controls < cases
The AUC is 0.902, which means that we have a great prediction.
A threshold around 0.05 seems to be the more balanced one. However, a company may need a different one depending on their interests.
We are going to start with a LDA (Linear Discriminant Analysis), where variance is reduced by introducing some bias.
lda.model1 <- lda(PriceClass ~ ., data=PhonesTrain, prior = c(3/5, 2/5))
lda.model1
## Call:
## lda(PriceClass ~ ., data = PhonesTrain, prior = c(3/5, 2/5))
##
## Prior probabilities of groups:
## Cheap Expensive
## 0.6 0.4
##
## Group means:
## Battery.capacity..mAh. Screen.size..inches. Touchscreen0 Resolution.x
## Cheap 2699.154 5.062718 0.02067947 681.7400
## Expensive 3307.139 5.650584 0.00243309 998.4088
## Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## Cheap 1226.069 4.809453 1739.925 16.00481 9.008567
## Expensive 1898.706 6.722628 3656.934 53.63504 16.949392
## Front.camera Operating.systemBlackBerry Operating.systemCyanogen
## Cheap 4.929985 0.00295421 0.007385524
## Expensive 10.588078 0.00973236 0.002433090
## Operating.systemiOS Operating.systemSailfish Operating.systemTizen
## Cheap 0.001477105 0.001477105 0.00295421
## Expensive 0.036496350 0.000000000 0.00000000
## Operating.systemWindows Wi.Fi0 Bluetooth0 GPS0
## Cheap 0.01329394 0.008862629 0.01624815 0.09748892
## Expensive 0.01946472 0.002433090 0.00243309 0.03406326
## Number.of.SIMs2 Number.of.SIMs3 X3G0 X4G..LTE0
## Cheap 0.8847858 0.001477105 0.11225997 0.3190547
## Expensive 0.7396594 0.000000000 0.09489051 0.1630170
##
## Coefficients of linear discriminants:
## LD1
## Battery.capacity..mAh. -2.251542e-05
## Screen.size..inches. -1.161382e-01
## Touchscreen0 8.879698e-02
## Resolution.x 1.576805e-03
## Resolution.y 4.155798e-04
## Processor 7.750866e-02
## RAM..MB. 2.282514e-04
## Internal.storage..GB. 5.439563e-03
## Rear.camera 1.245843e-02
## Front.camera 1.925067e-02
## Operating.systemBlackBerry 1.620904e+00
## Operating.systemCyanogen -9.204590e-02
## Operating.systemiOS 1.711426e+00
## Operating.systemSailfish -8.160616e-01
## Operating.systemTizen -5.620493e-02
## Operating.systemWindows 7.314360e-01
## Wi.Fi0 3.049030e-01
## Bluetooth0 -6.131765e-01
## GPS0 -6.193790e-01
## Number.of.SIMs2 -6.718253e-01
## Number.of.SIMs3 -8.304434e-01
## X3G0 -6.299292e-01
## X4G..LTE0 3.369768e-01
Note prior = c(3/5, 2/5) are roughly the class proportions for the training set, hence it’s equivalent to
lda.model2 <- lda(PriceClass ~ ., data=PhonesTrain)
lda.model2
## Call:
## lda(PriceClass ~ ., data = PhonesTrain)
##
## Prior probabilities of groups:
## Cheap Expensive
## 0.6222426 0.3777574
##
## Group means:
## Battery.capacity..mAh. Screen.size..inches. Touchscreen0 Resolution.x
## Cheap 2699.154 5.062718 0.02067947 681.7400
## Expensive 3307.139 5.650584 0.00243309 998.4088
## Resolution.y Processor RAM..MB. Internal.storage..GB. Rear.camera
## Cheap 1226.069 4.809453 1739.925 16.00481 9.008567
## Expensive 1898.706 6.722628 3656.934 53.63504 16.949392
## Front.camera Operating.systemBlackBerry Operating.systemCyanogen
## Cheap 4.929985 0.00295421 0.007385524
## Expensive 10.588078 0.00973236 0.002433090
## Operating.systemiOS Operating.systemSailfish Operating.systemTizen
## Cheap 0.001477105 0.001477105 0.00295421
## Expensive 0.036496350 0.000000000 0.00000000
## Operating.systemWindows Wi.Fi0 Bluetooth0 GPS0
## Cheap 0.01329394 0.008862629 0.01624815 0.09748892
## Expensive 0.01946472 0.002433090 0.00243309 0.03406326
## Number.of.SIMs2 Number.of.SIMs3 X3G0 X4G..LTE0
## Cheap 0.8847858 0.001477105 0.11225997 0.3190547
## Expensive 0.7396594 0.000000000 0.09489051 0.1630170
##
## Coefficients of linear discriminants:
## LD1
## Battery.capacity..mAh. -2.251542e-05
## Screen.size..inches. -1.161382e-01
## Touchscreen0 8.879698e-02
## Resolution.x 1.576805e-03
## Resolution.y 4.155798e-04
## Processor 7.750866e-02
## RAM..MB. 2.282514e-04
## Internal.storage..GB. 5.439563e-03
## Rear.camera 1.245843e-02
## Front.camera 1.925067e-02
## Operating.systemBlackBerry 1.620904e+00
## Operating.systemCyanogen -9.204590e-02
## Operating.systemiOS 1.711426e+00
## Operating.systemSailfish -8.160616e-01
## Operating.systemTizen -5.620493e-02
## Operating.systemWindows 7.314360e-01
## Wi.Fi0 3.049030e-01
## Bluetooth0 -6.131765e-01
## GPS0 -6.193790e-01
## Number.of.SIMs2 -6.718253e-01
## Number.of.SIMs3 -8.304434e-01
## X3G0 -6.299292e-01
## X4G..LTE0 3.369768e-01
In practice, a bit better performance is attained if we shrink the prior probabilities towards 1/3
Output: posterior probabilities
probability = predict(lda.model2, newdata=PhonesTest)$posterior
head(probability)
## Cheap Expensive
## 7 0.0003965069 0.9996035
## 8 0.0002278220 0.9997722
## 9 0.0017598720 0.9982401
## 12 0.0395493058 0.9604507
## 17 0.0043647393 0.9956353
## 18 0.0780282858 0.9219717
To predict the labels for delay, we apply the Bayes rule of maximum probability
prediction <- max.col(probability)
head(prediction)
## [1] 2 2 2 2 2 2
which is equivalent to
prediction = predict(lda.model2, newdata=PhonesTest)$class
head(prediction)
## [1] Expensive Expensive Expensive Expensive Expensive Expensive
## Levels: Cheap Expensive
Performance
The confusion matrix: predictions in rows, true values in columns (but we can change the order)
conf_lda_matrix = confusionMatrix(prediction, PhonesTest$PriceClass)$table
conf_lda_matrix
## Reference
## Prediction Cheap Expensive
## Cheap 145 30
## Expensive 24 72
confusionMatrix(prediction, PhonesTest$PriceClass)$overall[1]
## Accuracy
## 0.800738
#qda.model1 <- qda(PriceClass ~ ., data=PhonesTrain, prior = c(3/5, 2/5))
#qda.model1
#qda.model2 <- qda(PriceClass ~ ., data=PhonesTest, prior = c(3/5, 2/5))
#qda.model2
Performance:
#prediction = predict(qda.model2, newdata=PhonesTest)$class
#confusionMatrix(prediction, PhonesTest$PriceClass)$table
#confusionMatrix(prediction, PhonesTest$PriceClass)$overall[1]
We have many predictors, hence our benchmark will be the penalized logistic regression
ctrl <- trainControl(method = "cv", number = 5,
classProbs = TRUE,
verboseIter=T)
# We have many predictors, hence use penalized logistic regression
lrFit <- train(PriceClass ~ .,
method = "glmnet",
tuneGrid = expand.grid(alpha = seq(0, 1, 0.1),
lambda = seq(0, .1, 0.02)),
metric = "Kappa",
data = PhonesTrain,
preProcess = c("center", "scale"),
trControl = ctrl)
## + Fold1: alpha=0.0, lambda=0.1
## - Fold1: alpha=0.0, lambda=0.1
## + Fold1: alpha=0.1, lambda=0.1
## - Fold1: alpha=0.1, lambda=0.1
## + Fold1: alpha=0.2, lambda=0.1
## - Fold1: alpha=0.2, lambda=0.1
## + Fold1: alpha=0.3, lambda=0.1
## - Fold1: alpha=0.3, lambda=0.1
## + Fold1: alpha=0.4, lambda=0.1
## - Fold1: alpha=0.4, lambda=0.1
## + Fold1: alpha=0.5, lambda=0.1
## - Fold1: alpha=0.5, lambda=0.1
## + Fold1: alpha=0.6, lambda=0.1
## - Fold1: alpha=0.6, lambda=0.1
## + Fold1: alpha=0.7, lambda=0.1
## - Fold1: alpha=0.7, lambda=0.1
## + Fold1: alpha=0.8, lambda=0.1
## - Fold1: alpha=0.8, lambda=0.1
## + Fold1: alpha=0.9, lambda=0.1
## - Fold1: alpha=0.9, lambda=0.1
## + Fold1: alpha=1.0, lambda=0.1
## - Fold1: alpha=1.0, lambda=0.1
## + Fold2: alpha=0.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.0, lambda=0.1
## + Fold2: alpha=0.1, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.1, lambda=0.1
## + Fold2: alpha=0.2, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.2, lambda=0.1
## + Fold2: alpha=0.3, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.3, lambda=0.1
## + Fold2: alpha=0.4, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.4, lambda=0.1
## + Fold2: alpha=0.5, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.5, lambda=0.1
## + Fold2: alpha=0.6, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.6, lambda=0.1
## + Fold2: alpha=0.7, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.7, lambda=0.1
## + Fold2: alpha=0.8, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.8, lambda=0.1
## + Fold2: alpha=0.9, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=0.9, lambda=0.1
## + Fold2: alpha=1.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: alpha=1.0, lambda=0.1
## + Fold3: alpha=0.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.0, lambda=0.1
## + Fold3: alpha=0.1, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.1, lambda=0.1
## + Fold3: alpha=0.2, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.2, lambda=0.1
## + Fold3: alpha=0.3, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.3, lambda=0.1
## + Fold3: alpha=0.4, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.4, lambda=0.1
## + Fold3: alpha=0.5, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.5, lambda=0.1
## + Fold3: alpha=0.6, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.6, lambda=0.1
## + Fold3: alpha=0.7, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.7, lambda=0.1
## + Fold3: alpha=0.8, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.8, lambda=0.1
## + Fold3: alpha=0.9, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=0.9, lambda=0.1
## + Fold3: alpha=1.0, lambda=0.1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold3: alpha=1.0, lambda=0.1
## + Fold4: alpha=0.0, lambda=0.1
## - Fold4: alpha=0.0, lambda=0.1
## + Fold4: alpha=0.1, lambda=0.1
## - Fold4: alpha=0.1, lambda=0.1
## + Fold4: alpha=0.2, lambda=0.1
## - Fold4: alpha=0.2, lambda=0.1
## + Fold4: alpha=0.3, lambda=0.1
## - Fold4: alpha=0.3, lambda=0.1
## + Fold4: alpha=0.4, lambda=0.1
## - Fold4: alpha=0.4, lambda=0.1
## + Fold4: alpha=0.5, lambda=0.1
## - Fold4: alpha=0.5, lambda=0.1
## + Fold4: alpha=0.6, lambda=0.1
## - Fold4: alpha=0.6, lambda=0.1
## + Fold4: alpha=0.7, lambda=0.1
## - Fold4: alpha=0.7, lambda=0.1
## + Fold4: alpha=0.8, lambda=0.1
## - Fold4: alpha=0.8, lambda=0.1
## + Fold4: alpha=0.9, lambda=0.1
## - Fold4: alpha=0.9, lambda=0.1
## + Fold4: alpha=1.0, lambda=0.1
## - Fold4: alpha=1.0, lambda=0.1
## + Fold5: alpha=0.0, lambda=0.1
## - Fold5: alpha=0.0, lambda=0.1
## + Fold5: alpha=0.1, lambda=0.1
## - Fold5: alpha=0.1, lambda=0.1
## + Fold5: alpha=0.2, lambda=0.1
## - Fold5: alpha=0.2, lambda=0.1
## + Fold5: alpha=0.3, lambda=0.1
## - Fold5: alpha=0.3, lambda=0.1
## + Fold5: alpha=0.4, lambda=0.1
## - Fold5: alpha=0.4, lambda=0.1
## + Fold5: alpha=0.5, lambda=0.1
## - Fold5: alpha=0.5, lambda=0.1
## + Fold5: alpha=0.6, lambda=0.1
## - Fold5: alpha=0.6, lambda=0.1
## + Fold5: alpha=0.7, lambda=0.1
## - Fold5: alpha=0.7, lambda=0.1
## + Fold5: alpha=0.8, lambda=0.1
## - Fold5: alpha=0.8, lambda=0.1
## + Fold5: alpha=0.9, lambda=0.1
## - Fold5: alpha=0.9, lambda=0.1
## + Fold5: alpha=1.0, lambda=0.1
## - Fold5: alpha=1.0, lambda=0.1
## Aggregating results
## Selecting tuning parameters
## Fitting alpha = 0.9, lambda = 0 on full training set
print(lrFit)
## glmnet
##
## 1088 samples
## 17 predictor
## 2 classes: 'Cheap', 'Expensive'
##
## Pre-processing: centered (23), scaled (23)
## Resampling: Cross-Validated (5 fold)
## Summary of sample sizes: 870, 870, 870, 871, 871
## Resampling results across tuning parameters:
##
## alpha lambda Accuracy Kappa
## 0.0 0.00 0.8170887 0.6019786
## 0.0 0.02 0.8170887 0.6019786
## 0.0 0.04 0.8170972 0.6023910
## 0.0 0.06 0.8143449 0.5957725
## 0.0 0.08 0.8115926 0.5891784
## 0.0 0.10 0.8125100 0.5913735
## 0.1 0.00 0.8244366 0.6170251
## 0.1 0.02 0.8180104 0.6037317
## 0.1 0.04 0.8161882 0.5997617
## 0.1 0.06 0.8161924 0.5989857
## 0.1 0.08 0.8171141 0.6007498
## 0.1 0.10 0.8171057 0.6007514
## 0.2 0.00 0.8244366 0.6170251
## 0.2 0.02 0.8207669 0.6099444
## 0.2 0.04 0.8171057 0.6015591
## 0.2 0.06 0.8207838 0.6095693
## 0.2 0.08 0.8161798 0.5985819
## 0.2 0.10 0.8189236 0.6031810
## 0.3 0.00 0.8244366 0.6170251
## 0.3 0.02 0.8216886 0.6117504
## 0.3 0.04 0.8198579 0.6073736
## 0.3 0.06 0.8189321 0.6047862
## 0.3 0.08 0.8161671 0.5968919
## 0.3 0.10 0.8161713 0.5952188
## 0.4 0.00 0.8244366 0.6170251
## 0.4 0.02 0.8189321 0.6052241
## 0.4 0.04 0.8216928 0.6110114
## 0.4 0.06 0.8198495 0.6062537
## 0.4 0.08 0.8161713 0.5952188
## 0.4 0.10 0.8106498 0.5810254
## 0.5 0.00 0.8244366 0.6170251
## 0.5 0.02 0.8198537 0.6066710
## 0.5 0.04 0.8189321 0.6040335
## 0.5 0.06 0.8198495 0.6050769
## 0.5 0.08 0.8143322 0.5905050
## 0.5 0.10 0.8060542 0.5688001
## 0.6 0.00 0.8244366 0.6170251
## 0.6 0.02 0.8207711 0.6088569
## 0.6 0.04 0.8180146 0.6022325
## 0.6 0.06 0.8152581 0.5939561
## 0.6 0.08 0.8088234 0.5769884
## 0.6 0.10 0.7996322 0.5540875
## 0.7 0.00 0.8244366 0.6170251
## 0.7 0.02 0.8216886 0.6107029
## 0.7 0.04 0.8189363 0.6041125
## 0.7 0.06 0.8161713 0.5953406
## 0.7 0.08 0.8023929 0.5612103
## 0.7 0.10 0.8014670 0.5573539
## 0.8 0.00 0.8244366 0.6170251
## 0.8 0.02 0.8216886 0.6107029
## 0.8 0.04 0.8180104 0.6014300
## 0.8 0.06 0.8088192 0.5784075
## 0.8 0.08 0.7996322 0.5540875
## 0.8 0.10 0.7977889 0.5477813
## 0.9 0.00 0.8253541 0.6188414
## 0.9 0.02 0.8207669 0.6088619
## 0.9 0.04 0.8161755 0.5970105
## 0.9 0.06 0.8042193 0.5669729
## 0.9 0.08 0.7977889 0.5482597
## 0.9 0.10 0.7959455 0.5427031
## 1.0 0.00 0.8253541 0.6188414
## 1.0 0.02 0.8226060 0.6132162
## 1.0 0.04 0.8143322 0.5929705
## 1.0 0.06 0.8005496 0.5580273
## 1.0 0.08 0.7959455 0.5436850
## 1.0 0.10 0.7950281 0.5404026
##
## Kappa was used to select the optimal model using the largest value.
## The final values used for the model were alpha = 0.9 and lambda = 0.
lrPred = predict(lrFit, PhonesTest)
confusionMatrix(lrPred, PhonesTest$PriceClass)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Cheap Expensive
## Cheap 148 30
## Expensive 21 72
##
## Accuracy : 0.8118
## 95% CI : (0.7601, 0.8566)
## No Information Rate : 0.6236
## P-Value [Acc > NIR] : 1.418e-11
##
## Kappa : 0.592
##
## Mcnemar's Test P-Value : 0.2626
##
## Sensitivity : 0.8757
## Specificity : 0.7059
## Pos Pred Value : 0.8315
## Neg Pred Value : 0.7742
## Prevalence : 0.6236
## Detection Rate : 0.5461
## Detection Prevalence : 0.6568
## Balanced Accuracy : 0.7908
##
## 'Positive' Class : Cheap
##
The accuracy obtained is around 81% and Kappa around 0.56, not bad but should be improved.
A technology resale company needs to know if buying a second hand phone would be or not profitable. Therefore, we need to set some guidelines in order to be able to determine in the most optimal way the price of a phone without knowing what a person would actually pay for it.
We decided to assume the following costs of each possible outcome:
Cost of true cheap phones is 0: The company buys the phone and sells it to the price we estimated.
Cost of false expensive is 70: The company sells the phone cheap when people would pay for it even if it was expensive.
Cost of false cheap is 200: (The most problematic error) The company pays a lot for a phone that is not sold unless it was cheaper.
Cost of true expensives is 0: The company buys the phone and sells it to the price we estimated.
Cost matrix:
| Prediction/Reality | Cheap | Expensive |
|---|---|---|
| Cheap | 0 | 70 |
| Expensive | 200 | 0 |
Unit cost is then:
0*TN + 70*FP + 200*FN + 0*TP
# Type the unit cost here:
cost.unit <- c(0, 70, 200, 0)
Therefore, the unit cost for Naive classifier (no analytics knowledge) would be:
cost = 0*0.62 + 200*0 + 70*0.38 + + 0*0 = 27 eur/phone on average
However, lets study if we can reduce this cost:
Let’s use the threshold from the ROC curve, which was 0.05
threshold = 0.05
lrProb = predict(lrFit, PhonesTest, type="prob")
lrPred = rep("Cheap", nrow(PhonesTest))
lrPred[which(lrProb[,2] > threshold)] = "Expensive"
confusionMatrix(factor(lrPred), PhonesTest$PriceClass)
## Confusion Matrix and Statistics
##
## Reference
## Prediction Cheap Expensive
## Cheap 42 0
## Expensive 127 102
##
## Accuracy : 0.5314
## 95% CI : (0.47, 0.592)
## No Information Rate : 0.6236
## P-Value [Acc > NIR] : 0.9992
##
## Kappa : 0.1993
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.2485
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 0.4454
## Prevalence : 0.6236
## Detection Rate : 0.1550
## Detection Prevalence : 0.1550
## Balanced Accuracy : 0.6243
##
## 'Positive' Class : Cheap
##
Now we compute the cost per phone
CM = confusionMatrix(factor(lrPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 32.80443
The cost per unit obtained is 33 eur, greater than the naive one. This tells us that even if the ROC curve’s gave us a threshold, this does not mean that is the best option for our company. Furthermore, we must find the threshold value that optimizes the cost per phone, so it is as low as possible.
We tried the raw models with a fixed threshold to see our starting point:
paste0("Logistic Regression model costs: ",sum(as.vector(conf_log_reg$table)*cost.unit)/sum(conf_log_reg$table))
## [1] "Logistic Regression model costs: 26.8265682656827"
paste0("Penalized Logistic Regression model costs: ",sum(as.vector(conf_p_log_reg$table)*cost.unit)/sum(conf_p_log_reg$table))
## [1] "Penalized Logistic Regression model costs: 27.1599264705882"
paste0("ROC curve costs: ", cost)
## [1] "ROC curve costs: 32.8044280442804"
paste0("LDA model costs: ",sum(as.vector(conf_lda_matrix)*cost.unit)/sum(conf_lda_matrix))
## [1] "LDA model costs: 28.3394833948339"
The best one was the logistic regression model, with 26.82 eur/phone cost. Let us try to reduce this value by optimizing the threshold:
However, the cost we obtained is only with a fixed threshold, so if this threshold is optimized we can obtain then the best logistic regression model for our prediction:
cost.i = matrix(NA, nrow = 100, ncol = 10)
# 20 replicates for training/testing sets for each of the 10 values of threshold
j <- 0
for (threshold in seq(0.05,0.5,0.05)){
j <- j + 1
cat(j)
for(i in 1:100){
# partition data intro training (80%) and testing sets (20%)
d <- createDataPartition(PhonesTrain$PriceClass, p = 0.8, list = FALSE)
# select training sample
train <- PhonesTrain[d,]
test <- PhonesTrain[-d,]
lrFit <- train(PriceClass ~ ., data=train, method = "glmnet",
tuneGrid = data.frame(alpha = 0.3, lambda = 0),
preProcess = c("center",
"scale"),
trControl = trainControl(method = "none", classProbs = TRUE))
lrProb = predict(lrFit, test, type="prob")
lrPred = rep("Cheap", nrow(test))
lrPred[which(lrProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(lrPred), test$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
cost.i[i,j] <- cost
}
}
## 1
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## 2
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## 3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## 4
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen, Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen, Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 5
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 7
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemBlackBerry,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## 9
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## 10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemTizen
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish,
## Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
# Threshold optimization:
boxplot(cost.i, main = "Threshold selection",
ylab = "unit cost",
xlab = "threshold value",
names = seq(0.05,0.5,0.05),col="royalblue2",las=2)
# values around 0.2 are reasonable
apply(cost.i, 2, median)
## [1] 33.50230 26.61290 22.69585 22.94931 22.48848 23.15668 24.10138 25.89862
## [9] 25.69124 27.05069
We can see that the best threshold value is 0.25, which has a mean cost of 22 eur/phone. We can see that we have reduced the cost that we got with the raw classifiers.
The final prediction using this threshold is:
threshold = 0.25
lrFit <- train(PriceClass ~ ., data=PhonesTrain, method = "glmnet",
tuneGrid = data.frame(alpha = 0.3, lambda = 0), preProcess = c("center", "scale"),
trControl = trainControl(method = "none", classProbs = TRUE))
lrProb = predict(lrFit, PhonesTest, type="prob")
lrPred = rep("Cheap", nrow(PhonesTest))
lrPred[which(lrProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(lrPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.45018
We obtained a cost of 18.5 eur/phone. This is the lowest cost obtained so far, so we can set the threshold in 0.25.
library(rpart)
# Hyper-parameters
control = rpart.control(minsplit = 30, maxdepth = 10, cp=0.01)
# minsplit: minimum number of observations in a node before before a split
# maxdepth: maximum depth of any node of the final tree
# cp: degree of complexity, the smaller the more branches
A decision tree
model = PriceClass ~.
dtFit <- rpart(model, data=PhonesTrain, method = "class", control = control)
summary(dtFit)
## Call:
## rpart(formula = model, data = PhonesTrain, method = "class",
## control = control)
## n= 1088
##
## CP nsplit rel error xerror xstd
## 1 0.48418491 0 1.0000000 1.0000000 0.03890980
## 2 0.01703163 1 0.5158151 0.5279805 0.03206880
## 3 0.01459854 3 0.4817518 0.5255474 0.03201318
## 4 0.01000000 4 0.4671533 0.5085158 0.03161632
##
## Variable importance
## Resolution.y Resolution.x RAM..MB.
## 22 21 13
## Internal.storage..GB. Processor Rear.camera
## 13 12 12
## Screen.size..inches. Battery.capacity..mAh. Operating.system
## 3 2 1
##
## Node number 1: 1088 observations, complexity param=0.4841849
## predicted class=Cheap expected loss=0.3777574 P(node) =1
## class counts: 677 411
## probabilities: 0.622 0.378
## left son=2 (717 obs) right son=3 (371 obs)
## Primary splits:
## Resolution.y < 1532 to the left, improve=171.6386, (0 missing)
## Resolution.x < 735 to the left, improve=168.8961, (0 missing)
## RAM..MB. < 3500 to the left, improve=150.6035, (0 missing)
## Internal.storage..GB. < 24 to the left, improve=147.3806, (0 missing)
## Rear.camera < 8.35 to the left, improve=127.4458, (0 missing)
## Surrogate splits:
## Resolution.x < 1052 to the left, agree=0.978, adj=0.935, (0 split)
## RAM..MB. < 2500 to the left, agree=0.835, adj=0.515, (0 split)
## Internal.storage..GB. < 24 to the left, agree=0.830, adj=0.501, (0 split)
## Processor < 5 to the left, agree=0.826, adj=0.491, (0 split)
## Rear.camera < 13.05 to the left, agree=0.803, adj=0.423, (0 split)
##
## Node number 2: 717 observations, complexity param=0.01703163
## predicted class=Cheap expected loss=0.1757322 P(node) =0.6590074
## class counts: 591 126
## probabilities: 0.824 0.176
## left son=4 (449 obs) right son=5 (268 obs)
## Primary splits:
## Rear.camera < 8.35 to the left, improve=19.93860, (0 missing)
## Processor < 5 to the left, improve=14.56142, (0 missing)
## Resolution.x < 510 to the left, improve=13.42088, (0 missing)
## Internal.storage..GB. < 24 to the left, improve=13.17083, (0 missing)
## RAM..MB. < 3500 to the left, improve=10.42950, (0 missing)
## Surrogate splits:
## RAM..MB. < 1500 to the left, agree=0.805, adj=0.478, (0 split)
## Screen.size..inches. < 5.1 to the left, agree=0.796, adj=0.455, (0 split)
## Processor < 5 to the left, agree=0.784, adj=0.422, (0 split)
## Internal.storage..GB. < 24 to the left, agree=0.775, adj=0.399, (0 split)
## Battery.capacity..mAh. < 2915 to the left, agree=0.762, adj=0.362, (0 split)
##
## Node number 3: 371 observations
## predicted class=Expensive expected loss=0.2318059 P(node) =0.3409926
## class counts: 86 285
## probabilities: 0.232 0.768
##
## Node number 4: 449 observations
## predicted class=Cheap expected loss=0.08463252 P(node) =0.4126838
## class counts: 411 38
## probabilities: 0.915 0.085
##
## Node number 5: 268 observations, complexity param=0.01703163
## predicted class=Cheap expected loss=0.3283582 P(node) =0.2463235
## class counts: 180 88
## probabilities: 0.672 0.328
## left son=10 (254 obs) right son=11 (14 obs)
## Primary splits:
## Screen.size..inches. < 4.815 to the right, improve=13.327070, (0 missing)
## Front.camera < 3.1 to the right, improve= 9.208692, (0 missing)
## RAM..MB. < 3500 to the left, improve= 4.744238, (0 missing)
## Rear.camera < 13.1 to the left, improve= 4.400181, (0 missing)
## Internal.storage..GB. < 48 to the left, improve= 3.304299, (0 missing)
## Surrogate splits:
## Battery.capacity..mAh. < 1980 to the right, agree=0.966, adj=0.357, (0 split)
## Resolution.x < 735 to the left, agree=0.966, adj=0.357, (0 split)
## Operating.system splits as L-LR--R, agree=0.966, adj=0.357, (0 split)
## Processor < 3 to the right, agree=0.963, adj=0.286, (0 split)
## Front.camera < 1.95 to the right, agree=0.959, adj=0.214, (0 split)
##
## Node number 10: 254 observations, complexity param=0.01459854
## predicted class=Cheap expected loss=0.2913386 P(node) =0.2334559
## class counts: 180 74
## probabilities: 0.709 0.291
## left son=20 (228 obs) right son=21 (26 obs)
## Primary splits:
## RAM..MB. < 3500 to the left, improve=6.082969, (0 missing)
## Processor < 6 to the left, improve=4.443180, (0 missing)
## Internal.storage..GB. < 48 to the left, improve=3.591884, (0 missing)
## Screen.size..inches. < 5.475 to the left, improve=2.791044, (0 missing)
## Front.camera < 14.5 to the left, improve=2.184549, (0 missing)
## Surrogate splits:
## Internal.storage..GB. < 48 to the left, agree=0.972, adj=0.731, (0 split)
## Rear.camera < 14.6 to the left, agree=0.917, adj=0.192, (0 split)
## Front.camera < 18 to the left, agree=0.909, adj=0.115, (0 split)
##
## Node number 11: 14 observations
## predicted class=Expensive expected loss=0 P(node) =0.01286765
## class counts: 0 14
## probabilities: 0.000 1.000
##
## Node number 20: 228 observations
## predicted class=Cheap expected loss=0.254386 P(node) =0.2095588
## class counts: 170 58
## probabilities: 0.746 0.254
##
## Node number 21: 26 observations
## predicted class=Expensive expected loss=0.3846154 P(node) =0.02389706
## class counts: 10 16
## probabilities: 0.385 0.615
library(rpart.plot)
## Warning: package 'rpart.plot' was built under R version 4.3.2
rpart.plot(dtFit, digits=3)
To create a full tree, we can set the complexity parameter cp to 0 (split even if it does not improve the tree) and we set the minimum number of observations in a node needed to split to the smallest value of 2
control = rpart.control(minsplit = 40, maxdepth = 12, cp=0.001)
dtFit <- rpart(model, data=PhonesTrain, method = "class", control = control)
rpart.plot(dtFit, digits = 3)
Prediction:
dtPred <- predict(dtFit, PhonesTest, type = "class")
dtProb <- predict(dtFit, PhonesTest, type = "prob")
threshold = 0.3
dtPred = rep("Cheap", nrow(PhonesTest))
dtPred[which(dtProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(dtPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.81919
By using the decission tree algorithm we obtain a cost of 20.5 eur/phone. However, the cost obtained with the cost-sensitive classifier is better (18.5 eur/phone).
Now using Caret, we have:
library(caret)
caret.fit <- train(model, data = PhonesTrain,
method = "rpart",
control=rpart.control(minsplit = 40, maxdepth = 12),
trControl = trainControl(method = "cv", number = 5),
tuneLength=10)
# caret.fit
Visualization
rpart.plot(caret.fit$finalModel)
Prediction
dtProb <- predict(caret.fit, PhonesTest, type = "prob")
threshold = 0.3
dtPred = rep("Cheap", nrow(PhonesTest))
dtPred[which(dtProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(dtPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.85609
We obtained 20.50 eur/phone again, there was no improvement in this method.
Lets try now the Random Forest.
rf.train <- randomForest(PriceClass ~., data=PhonesTrain,
ntree=200,
mtry=10,
cutoff=c(0.75,0.25),
importance=TRUE,
do.trace=T)
## ntree OOB 1 2
## 1: 20.80% 15.20% 30.20%
## 2: 22.78% 20.80% 25.88%
## 3: 21.92% 22.75% 20.58%
## 4: 21.68% 23.52% 18.70%
## 5: 22.87% 24.92% 19.52%
## 6: 23.75% 26.84% 18.65%
## 7: 23.35% 26.31% 18.48%
## 8: 23.89% 27.85% 17.49%
## 9: 24.49% 28.81% 17.44%
## 10: 24.17% 28.21% 17.56%
## 11: 23.68% 28.02% 16.59%
## 12: 23.27% 27.64% 16.10%
## 13: 22.58% 27.60% 14.36%
## 14: 23.13% 27.60% 15.82%
## 15: 22.72% 27.51% 14.84%
## 16: 22.54% 27.22% 14.84%
## 17: 23.00% 27.66% 15.33%
## 18: 22.72% 27.22% 15.33%
## 19: 23.90% 28.80% 15.82%
## 20: 23.53% 27.92% 16.30%
## 21: 23.44% 28.06% 15.82%
## 22: 23.71% 28.21% 16.30%
## 23: 24.08% 28.80% 16.30%
## 24: 24.36% 29.10% 16.55%
## 25: 23.99% 28.66% 16.30%
## 26: 23.99% 28.95% 15.82%
## 27: 23.44% 28.36% 15.33%
## 28: 23.53% 28.51% 15.33%
## 29: 23.62% 28.66% 15.33%
## 30: 22.98% 27.77% 15.09%
## 31: 23.25% 27.92% 15.57%
## 32: 23.53% 27.92% 16.30%
## 33: 23.44% 27.92% 16.06%
## 34: 23.62% 28.06% 16.30%
## 35: 23.25% 27.47% 16.30%
## 36: 23.53% 28.06% 16.06%
## 37: 23.62% 28.21% 16.06%
## 38: 23.25% 27.92% 15.57%
## 39: 22.98% 27.62% 15.33%
## 40: 23.16% 27.92% 15.33%
## 41: 22.98% 27.62% 15.33%
## 42: 22.61% 27.47% 14.60%
## 43: 22.70% 27.62% 14.60%
## 44: 22.43% 27.77% 13.63%
## 45: 22.52% 27.47% 14.36%
## 46: 22.43% 27.62% 13.87%
## 47: 22.52% 27.62% 14.11%
## 48: 22.15% 26.88% 14.36%
## 49: 22.33% 27.18% 14.36%
## 50: 22.24% 26.59% 15.09%
## 51: 22.15% 26.88% 14.36%
## 52: 22.15% 26.88% 14.36%
## 53: 22.06% 26.88% 14.11%
## 54: 21.97% 27.03% 13.63%
## 55: 21.69% 26.88% 13.14%
## 56: 21.42% 26.44% 13.14%
## 57: 21.78% 27.03% 13.14%
## 58: 21.97% 27.18% 13.38%
## 59: 22.06% 27.33% 13.38%
## 60: 22.06% 26.88% 14.11%
## 61: 21.78% 26.74% 13.63%
## 62: 21.97% 27.03% 13.63%
## 63: 21.88% 26.88% 13.63%
## 64: 21.88% 26.59% 14.11%
## 65: 22.61% 27.62% 14.36%
## 66: 22.52% 27.62% 14.11%
## 67: 22.61% 27.77% 14.11%
## 68: 22.61% 27.77% 14.11%
## 69: 22.79% 27.92% 14.36%
## 70: 22.79% 27.62% 14.84%
## 71: 22.89% 27.92% 14.60%
## 72: 22.89% 27.77% 14.84%
## 73: 22.70% 27.33% 15.09%
## 74: 22.98% 27.77% 15.09%
## 75: 22.89% 27.92% 14.60%
## 76: 22.43% 27.62% 13.87%
## 77: 22.89% 27.92% 14.60%
## 78: 22.61% 27.77% 14.11%
## 79: 22.43% 27.33% 14.36%
## 80: 22.43% 27.33% 14.36%
## 81: 22.43% 27.33% 14.36%
## 82: 22.52% 27.62% 14.11%
## 83: 22.43% 27.18% 14.60%
## 84: 22.24% 27.03% 14.36%
## 85: 22.33% 27.18% 14.36%
## 86: 22.52% 27.62% 14.11%
## 87: 22.43% 27.47% 14.11%
## 88: 22.79% 27.77% 14.60%
## 89: 22.24% 27.18% 14.11%
## 90: 22.89% 27.77% 14.84%
## 91: 22.98% 27.92% 14.84%
## 92: 22.89% 27.92% 14.60%
## 93: 22.70% 27.47% 14.84%
## 94: 22.89% 27.77% 14.84%
## 95: 22.79% 27.62% 14.84%
## 96: 22.79% 27.77% 14.60%
## 97: 23.25% 28.51% 14.60%
## 98: 23.35% 28.51% 14.84%
## 99: 23.16% 28.36% 14.60%
## 100: 23.07% 28.21% 14.60%
## 101: 23.07% 28.21% 14.60%
## 102: 22.98% 28.06% 14.60%
## 103: 22.79% 27.92% 14.36%
## 104: 22.98% 27.92% 14.84%
## 105: 23.16% 28.36% 14.60%
## 106: 22.98% 28.06% 14.60%
## 107: 23.07% 28.21% 14.60%
## 108: 23.16% 28.36% 14.60%
## 109: 23.07% 28.21% 14.60%
## 110: 23.07% 28.21% 14.60%
## 111: 22.70% 27.77% 14.36%
## 112: 22.70% 27.77% 14.36%
## 113: 23.07% 28.36% 14.36%
## 114: 23.07% 28.36% 14.36%
## 115: 22.89% 28.06% 14.36%
## 116: 22.79% 27.92% 14.36%
## 117: 22.89% 28.06% 14.36%
## 118: 22.89% 28.21% 14.11%
## 119: 23.25% 28.80% 14.11%
## 120: 22.98% 28.36% 14.11%
## 121: 23.25% 28.66% 14.36%
## 122: 23.16% 28.66% 14.11%
## 123: 22.98% 28.36% 14.11%
## 124: 22.70% 28.06% 13.87%
## 125: 22.70% 28.06% 13.87%
## 126: 22.52% 27.77% 13.87%
## 127: 22.52% 27.77% 13.87%
## 128: 22.43% 27.62% 13.87%
## 129: 22.70% 28.06% 13.87%
## 130: 22.43% 27.47% 14.11%
## 131: 22.33% 27.33% 14.11%
## 132: 22.24% 27.33% 13.87%
## 133: 22.43% 27.62% 13.87%
## 134: 22.52% 27.62% 14.11%
## 135: 22.79% 27.92% 14.36%
## 136: 22.89% 28.21% 14.11%
## 137: 22.89% 28.21% 14.11%
## 138: 22.89% 28.06% 14.36%
## 139: 22.89% 27.92% 14.60%
## 140: 22.70% 27.77% 14.36%
## 141: 22.52% 27.77% 13.87%
## 142: 22.89% 28.36% 13.87%
## 143: 22.70% 28.06% 13.87%
## 144: 22.70% 28.06% 13.87%
## 145: 22.52% 27.77% 13.87%
## 146: 22.61% 27.92% 13.87%
## 147: 22.70% 28.06% 13.87%
## 148: 22.70% 28.06% 13.87%
## 149: 22.61% 27.92% 13.87%
## 150: 22.61% 27.92% 13.87%
## 151: 22.61% 27.92% 13.87%
## 152: 22.61% 28.06% 13.63%
## 153: 22.61% 28.06% 13.63%
## 154: 22.79% 28.21% 13.87%
## 155: 22.89% 28.21% 14.11%
## 156: 22.61% 27.77% 14.11%
## 157: 22.52% 27.77% 13.87%
## 158: 22.52% 27.62% 14.11%
## 159: 22.43% 27.47% 14.11%
## 160: 22.43% 27.47% 14.11%
## 161: 22.52% 27.62% 14.11%
## 162: 22.52% 27.62% 14.11%
## 163: 22.52% 27.62% 14.11%
## 164: 22.52% 27.77% 13.87%
## 165: 22.43% 27.62% 13.87%
## 166: 22.43% 27.77% 13.63%
## 167: 22.33% 27.47% 13.87%
## 168: 22.33% 27.33% 14.11%
## 169: 22.61% 27.77% 14.11%
## 170: 22.52% 27.62% 14.11%
## 171: 22.52% 27.77% 13.87%
## 172: 22.61% 27.77% 14.11%
## 173: 22.33% 27.47% 13.87%
## 174: 22.52% 27.62% 14.11%
## 175: 22.52% 27.77% 13.87%
## 176: 22.24% 27.33% 13.87%
## 177: 22.24% 27.33% 13.87%
## 178: 22.15% 27.18% 13.87%
## 179: 22.15% 27.18% 13.87%
## 180: 22.33% 27.47% 13.87%
## 181: 22.33% 27.47% 13.87%
## 182: 22.24% 27.33% 13.87%
## 183: 22.52% 27.62% 14.11%
## 184: 22.33% 27.33% 14.11%
## 185: 22.24% 27.18% 14.11%
## 186: 22.43% 27.47% 14.11%
## 187: 22.33% 27.47% 13.87%
## 188: 22.61% 27.77% 14.11%
## 189: 22.52% 27.62% 14.11%
## 190: 22.43% 27.62% 13.87%
## 191: 22.43% 27.47% 14.11%
## 192: 22.43% 27.62% 13.87%
## 193: 22.52% 27.77% 13.87%
## 194: 22.52% 27.77% 13.87%
## 195: 22.43% 27.62% 13.87%
## 196: 22.43% 27.62% 13.87%
## 197: 22.52% 27.62% 14.11%
## 198: 22.61% 27.77% 14.11%
## 199: 22.52% 27.62% 14.11%
## 200: 22.52% 27.62% 14.11%
# mtry: number of variables randomly sampled as candidates at each split
# ntree: number of trees to grow
# cutoff: cutoff probabilities in majority vote
Prediction
rf.pred <- predict(rf.train, newdata=PhonesTest)
CM = confusionMatrix(factor(rf.pred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 19.5941
In this case, the cost obtained is 20 eur/phone. We improved the previous cost but it is not enough, since the classifier is still the best.
Now we are going to use caret to try to improve the random forest:
We define the specific function for the cost:
EconomicCost <- function(data, lev = NULL, model = NULL) {
y.pred = data$pred
y.true = data$obs
CM = confusionMatrix(y.pred, y.true)$table
out = sum(as.vector(CM)*cost.unit)/sum(CM)
names(out) <- c("EconomicCost")
out
}
Now include this function in the Caret control:
ctrl <- trainControl(method = "cv",
number = 5,
classProbs = TRUE,
summaryFunction = EconomicCost,
verboseIter=T)
Now train a RF using Caret with the specific metric:
rf.train <- train(PriceClass ~.,
method = "rf",
data = PhonesTrain,
preProcess = c("center", "scale"),
ntree = 200,
cutoff=c(0.7,0.3),
tuneGrid = expand.grid(mtry=c(6,8,10)),
metric = "EconomicCost",
maximize = F,
trControl = ctrl)
## + Fold1: mtry= 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry= 6
## + Fold1: mtry= 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry= 8
## + Fold1: mtry=10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Number.of.SIMs3
## - Fold1: mtry=10
## + Fold2: mtry= 6
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry= 6
## + Fold2: mtry= 8
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry= 8
## + Fold2: mtry=10
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: Operating.systemSailfish
## - Fold2: mtry=10
## + Fold3: mtry= 6
## - Fold3: mtry= 6
## + Fold3: mtry= 8
## - Fold3: mtry= 8
## + Fold3: mtry=10
## - Fold3: mtry=10
## + Fold4: mtry= 6
## - Fold4: mtry= 6
## + Fold4: mtry= 8
## - Fold4: mtry= 8
## + Fold4: mtry=10
## - Fold4: mtry=10
## + Fold5: mtry= 6
## - Fold5: mtry= 6
## + Fold5: mtry= 8
## - Fold5: mtry= 8
## + Fold5: mtry=10
## - Fold5: mtry=10
## Aggregating results
## Selecting tuning parameters
## Fitting mtry = 10 on full training set
Variable importance:
rf_imp <- varImp(rf.train, scale = F)
plot(rf_imp, scales = list(y = list(cex = .95)))
Prediction:
rfPred = predict(rf.train, newdata=PhonesTest)
CM = confusionMatrix(factor(rfPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.52399
We can see that the cost would be 18.2 eur/phone. This result is the best one obtained so far, however, we still need to try the gradient boosting before choosing the best option.
GBM.train <- gbm(ifelse(PhonesTrain$PriceClass=="Cheap",0,1) ~.,
data=PhonesTrain,
distribution= "bernoulli",
n.trees=250,
shrinkage = 0.01,
interaction.depth=2,
n.minobsinnode = 8)
Prediction and cost
threshold = 0.3
gbmProb = predict(GBM.train, newdata=PhonesTest, n.trees=250, type="response")
gbmPred = rep("Cheap", nrow(PhonesTest))
gbmPred[which(gbmProb > threshold)] = "Expensive"
CM = confusionMatrix(factor(gbmPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 18.37638
Not a very good result.
Let’s try now xgboost with Caret. Define first a grid for the hyperparameters:
xgb_grid = expand.grid(nrounds = c(500,1000),
eta = c(0.01, 0.001), # c(0.01,0.05,0.1)
max_depth = c(2, 4, 6),
gamma = 1,
colsample_bytree = c(0.2, 0.4),
min_child_weight = c(1,5),
subsample = 1 )
Then, train
xgb.train = train(PriceClass ~ .,
data=PhonesTrain,
trControl = ctrl,
metric="EconomicCost",
maximize = F,
tuneGrid = xgb_grid,
preProcess = c("center", "scale"),
method = "xgbTree" )
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:40:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:40:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:40:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold1: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:41:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:41:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:41:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold2: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:42:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:42:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:42:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold3: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:43:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:43:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:43:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold4: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:44:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:44:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:44:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.001, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=2, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=4, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.2, min_child_weight=5, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## [09:45:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=1, subsample=1, nrounds=1000
## + Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## [09:45:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:45:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## - Fold5: eta=0.010, max_depth=6, gamma=1, colsample_bytree=0.4, min_child_weight=5, subsample=1, nrounds=1000
## Aggregating results
## Selecting tuning parameters
## Fitting nrounds = 1000, max_depth = 6, eta = 0.01, gamma = 1, colsample_bytree = 0.4, min_child_weight = 1, subsample = 1 on full training set
Variable importance:
xgb_imp <- varImp(xgb.train, scale = F)
plot(xgb_imp, scales = list(y = list(cex = .95)))
Prediction and cost:
threshold = 0.3
xgbProb = predict(xgb.train, newdata=PhonesTest, type="prob")
xgbPred = rep("Cheap", nrow(PhonesTest))
xgbPred[which(xgbProb[,2] > threshold)] = "Expensive"
CM = confusionMatrix(factor(xgbPred), PhonesTest$PriceClass)$table
cost = sum(as.vector(CM)*cost.unit)/sum(CM)
cost
## [1] 20.4797
The cost obtained is 17.5 eur/phone, which is the best output obtained so far.
Now that we have finished the classification section, we can say that we managed to improve the savings of the fictional company by reducing the cost of each phone.
By selecting the best classifier, the Gradient Boosting one, we achieved a cost of 17.5 eur/phone. When we compare it to the naïve classifier that gave a cost of 33 eur/phone or the raw logistic regression classifier, with 27eur/phone cost, it may seem that there was no big improvement. However, if you take the 15 euros difference between classifiers, and multiply it by just a 1000 phones, we have 15000 euros of savings. Therefore, for a big company this little change can make the difference.
To sum up, being able to use these classification tools and select the best one in a real company can help them to earn much more money from each sale.
First we need to see which variables are the most correlated ones with the Price:
set.seed(123)
str(data)
## 'data.frame': 1359 obs. of 19 variables:
## $ Brand : Factor w/ 76 levels "10.or","Acer",..: 48 57 4 4 36 48 48 58 6 69 ...
## $ Battery.capacity..mAh.: num 0.616 0.599 0.593 0.421 0.599 ...
## $ Screen.size..inches. : num 0.871 0.837 0.837 0.755 0.816 ...
## $ Touchscreen : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
## $ Resolution.x : num 0.625 0.438 0.522 0.306 0.438 ...
## $ Resolution.y : num 0.795 0.591 0.673 0.418 0.574 ...
## $ Processor : num 0.778 0.778 0.556 0.556 0.778 ...
## $ RAM..MB. : num 1 0.497 0.33 0.33 0.497 ...
## $ Internal.storage..GB. : num 0.5 0.125 0.125 0.125 0.25 ...
## $ Rear.camera : num 0.444 0.593 0.111 0.111 0.111 ...
## $ Front.camera : num 0.333 0.333 0.25 0.25 0.667 ...
## $ Operating.system : Factor w/ 7 levels "Android","BlackBerry",..: 1 1 4 4 1 1 1 1 1 1 ...
## $ Wi.Fi : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
## $ Bluetooth : Factor w/ 2 levels "1","0": 1 1 1 1 1 1 1 1 1 1 ...
## $ GPS : Factor w/ 2 levels "1","0": 1 1 1 1 1 2 1 1 1 1 ...
## $ Number.of.SIMs : Factor w/ 3 levels "1","2","3": 2 2 2 2 1 2 2 2 1 2 ...
## $ X3G : Factor w/ 2 levels "1","0": 1 1 1 1 2 1 1 1 1 2 ...
## $ X4G..LTE : Factor w/ 2 levels "1","0": 1 1 1 1 2 1 1 1 1 2 ...
## $ Price : num 653 310 1183 696 553 ...
data_cor = data[,c(-1, -4, -12, -13, -14, -15, -16, -17, -18)] #Remove non-numerical variables
corr_delay <- sort(cor(data_cor)["Price",], decreasing = T)
corr=data.frame(corr_delay)
ggplot(corr,aes(x = row.names(corr), y = corr_delay)) + geom_bar(stat = "identity", fill = "lightblue") + scale_x_discrete(limits= row.names(corr)) + labs(x = "", y = "Price", title = "Correlations") + theme(plot.title = element_text(hjust = 0, size = rel(1.5)), axis.text.x = element_text(angle = 45, hjust = 1))
We can see that the most correlated variable with Price is the Internal.storage..GB, followed by RAM..MB and the Resolution. However, all variables are over 0.25, showing some kind of relationship with the price.
Remembering relationships between variables…
gCor1
gCor2
cor_mat = cor(data_numerical)
heatmap(cor_mat)
We see that the screen’s resolution can be almost represented by one of the axis. This makes sense as smartphones tend to maintain a basic resolution (for example 1080p) but it is adjusted depending on the height of the device (y axis). We can see this properly in graph g12.
g12
This makes us reckon that there is a chance of getting a better model in eliminating the Resolution.x. Despite that, we will not delete that feature, by now.
Another detail to mention must be the relationship between the memories, internal and RAM, strongly related between them but not as strong as the resolutions.
The goal of this part is to create a regression model made of numerical variables such that we can predict the price of a mobile phone given certain characteristics. We will make other splits but now using Price with its numerical values.
# set.seed(123)
for_training = createDataPartition(log(data$Price), p = 0.75, list = FALSE)
# 75% for training
training = data[ for_training,]
testing = data[- for_training,]
From now on, we will use training and testing.
As a first approach to build the best possible model, the fastest idea is to use simple and multiple regression models.
For the simple regression models, we will use the most correlated variables as we have seen in the correlation matrix above. Before that, let’s get some insights on the relationships between the most correlated variables.
Let’s see first the variability of Price:
training %>% ggplot(aes(x=Price)) + geom_density(fill="navyblue")
training %>% ggplot(aes(x=Price / Internal.storage..GB.)) + geom_density(fill="navyblue")
## Warning: Removed 1 rows containing non-finite values (`stat_density()`).
training %>% ggplot(aes(x= Price, y = Internal.storage..GB.)) +
geom_point(fill="navyblue") # Most "constant" variability
# training %>% ggplot(aes(x= log(Price), y = Internal.storage..GB.)) +
# geom_point(fill="navyblue")
# training %>% ggplot(aes(x= Price, y = log(Internal.storage..GB.))) +
# geom_point(fill="navyblue")
# training %>% ggplot(aes(x= log(Price), y = log(Internal.storage..GB.))) +
# geom_point(fill="navyblue")
We see that Price itself has a lower variability and, in fact, using it per GB of internal storage it has an even lower variability. This is indicating that Internal.storage..GB. is a feature to not leave out. Also, we saw that logarithms did not help to reduce the variability. Therefore for our simple model we will just use Price along Internal.storage..GB.
# Simple regression model with just Price and Internal Storage
simple1 = lm(Price ~ Internal.storage..GB., data = training)
summary(simple1) # poor result R^2 = 0.4333
##
## Call:
## lm(formula = Price ~ Internal.storage..GB., data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -431.50 -37.65 -21.65 6.35 975.38
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 47.821 4.298 11.12 <2e-16 ***
## Internal.storage..GB. 1279.526 45.789 27.94 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 106.1 on 1019 degrees of freedom
## Multiple R-squared: 0.4338, Adjusted R-squared: 0.4333
## F-statistic: 780.9 on 1 and 1019 DF, p-value: < 2.2e-16
cor(predict(simple1, newdata = testing), testing$Price) ^ 2 # tested result
## [1] 0.3979553
par(mfrow=c(2,2))
plot(simple1, pch = 23 ,bg='mediumpurple3', cex = 2)
Despite an R-squared below 0.45 we see that we are on a “good path” as the residuals seem to have enough flexibility and the Normal Q-Q seems to be proper. However, the Scale-Location and the Residuals vs Leverage shows us that there is a big room for improvement, in addition to that 0.4333 of R-squared value. Nontheless, the predicted R^2 is below 0.40.
Now, we will use more variables in seek of the best model.
# multiple1 = lm(Price ~ Internal.storage..GB. + RAM..MB. + Resolution.y*
# Resolution.x + Screen.size..inches.,
# data = training)
# summary(multiple1)
# multiple2 = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
# Resolution.x + Screen.size..inches.,
# data = training)
# summary(multiple2)
# multiple3 = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
# Resolution.x + Screen.size..inches. + Rear.camera * Front.camera,
# data = training)
# summary(multiple3) # Adjusted R-squared: 0.593
multiple4 = lm(Price ~ Internal.storage..GB.* RAM..MB. +
Resolution.y * Resolution.x +
Front.camera + Processor * Battery.capacity..mAh.,
data = training)
summary(multiple4)
##
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Resolution.y *
## Resolution.x + Front.camera + Processor * Battery.capacity..mAh.,
## data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -363.87 -32.83 -11.54 12.17 906.38
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 46.919 15.998 2.933 0.003435 **
## Internal.storage..GB. -214.338 140.355 -1.527 0.127046
## RAM..MB. 118.966 50.464 2.357 0.018590 *
## Resolution.y -2.494 81.804 -0.030 0.975685
## Resolution.x -11.768 65.235 -0.180 0.856876
## Front.camera -115.325 34.493 -3.343 0.000858 ***
## Processor -31.936 34.766 -0.919 0.358523
## Battery.capacity..mAh. 19.075 45.798 0.417 0.677131
## Internal.storage..GB.:RAM..MB. 1587.624 174.215 9.113 < 2e-16 ***
## Resolution.y:Resolution.x 586.521 118.884 4.934 9.44e-07 ***
## Processor:Battery.capacity..mAh. -33.341 77.216 -0.432 0.665993
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 89.96 on 1010 degrees of freedom
## Multiple R-squared: 0.5967, Adjusted R-squared: 0.5927
## F-statistic: 149.5 on 10 and 1010 DF, p-value: < 2.2e-16
cor(predict(multiple4, newdata = testing), testing$Price)^2 # 0.4415989
## [1] 0.4415989
After trying several combinations, the best multiple regression model was the one in which we paired numerical variables. The result was good in theory around R-squared value as 0.5927. However, when predicting with that model we got a bit more than 0.44. So this is where we started to realise that we needed to take into account somehow categorical variables in a numerical model.
We consider that the Brand and Operating system of a mobile phone can be a crucial factor to determine its price. Therefore, we will pass them from categorical to numerical and then normalise them. Then we will repeat the procedure for the Multiple regression and see what is the result (using re-created splits).
# Brand from categorical to normalised numerical
data$Brand = as.factor(data$Brand)
data$Brand = as.integer(data$Brand)
data$Brand = (data$Brand - min(data$Brand)) / (max(data$Brand) - min(data$Brand))
# Operating System from categorical to normalised numerical
data$Operating.system = as.factor(data$Operating.system)
data$Operating.system = as.integer(data$Operating.system)
data$Operating.system = (data$Operating.system - min(data$Operating.system)) /
(max(data$Operating.system) - min(data$Operating.system))
# We will re-create the partitions
for_training = createDataPartition(log(data$Price), p = 0.75, list = FALSE)
# 75% for training
training = data[ for_training,]
testing = data[- for_training,]
After changing combinations and selecting variables, the best multiple regression model obtained was:
multipleBest = lm(Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
Resolution.x + Front.camera +
Brand * Operating.system,
data = training)
summary(multipleBest) # Adjusted R-squared: 0.586
##
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Resolution.y *
## Resolution.x + Front.camera + Brand * Operating.system, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -409.90 -34.49 -7.33 13.53 661.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.88 13.39 3.650 0.000275 ***
## Internal.storage..GB. -169.15 145.43 -1.163 0.245055
## RAM..MB. 254.33 52.24 4.868 1.31e-06 ***
## Resolution.y -59.17 70.86 -0.835 0.403904
## Resolution.x -95.34 58.50 -1.630 0.103459
## Front.camera -93.05 32.32 -2.879 0.004068 **
## Brand -13.43 11.11 -1.209 0.226921
## Operating.system 472.07 50.12 9.419 < 2e-16 ***
## Internal.storage..GB.:RAM..MB. 1139.90 210.77 5.408 7.94e-08 ***
## Resolution.y:Resolution.x 626.66 100.38 6.243 6.31e-10 ***
## Brand:Operating.system -641.95 88.24 -7.275 6.96e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87.1 on 1010 degrees of freedom
## Multiple R-squared: 0.59, Adjusted R-squared: 0.586
## F-statistic: 145.4 on 10 and 1010 DF, p-value: < 2.2e-16
cor(predict(multipleBest, newdata = testing), testing$Price)^2 # 0.6151373
## [1] 0.6151373
Despite having a lower theoretical R-squared = 0.586, in practice at the moment of predicting we went from 0.44 to 0.615. Now we can fairly say that this model is actually good for predicting the price in comparison to the single regression model.
In case we could improve our model, the best way to see it now is by using a more automatically way by inspecting the different combinations. There are several ways to automatise this part, either using the library leaps or olsrr. In this case, we will use the library olsrr, we could select the best model by looking at all possible, the best of subsets, stepping forward, backward and for AIC. The method that got us the best results was old_step_best_subset().
(For the sake of clarity and understanding the project, we will not include all the selecting methods, just the one that got us the best results).
library(olsrr)
## Warning: package 'olsrr' was built under R version 4.3.2
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:MASS':
##
## cement
## The following object is masked from 'package:datasets':
##
## rivers
model = Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
Resolution.x + Front.camera +
Brand * Operating.system
fittness = lm(model, data = training)
ols_step_best_subset(fittness)
plot(ols_step_best_subset(fittness))
It tells us that the best model in all terms, complexity and R-squared is the 7 by just a little bit. Let’s see:
# RAM..MB. Resolution.x Front.camera Operating.system Internal.storage..GB.:RAM..MB. Resolution.y:Resolution.x Brand:Operating.system
multipleGoat = lm(Price ~ Internal.storage..GB. * RAM..MB. + Front.camera +
Resolution.y * Resolution.x + Brand * Operating.system,
data = training)
summary(multipleGoat) # Adjusted R-squared: 0.586
##
## Call:
## lm(formula = Price ~ Internal.storage..GB. * RAM..MB. + Front.camera +
## Resolution.y * Resolution.x + Brand * Operating.system, data = training)
##
## Residuals:
## Min 1Q Median 3Q Max
## -409.90 -34.49 -7.33 13.53 661.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 48.88 13.39 3.650 0.000275 ***
## Internal.storage..GB. -169.15 145.43 -1.163 0.245055
## RAM..MB. 254.33 52.24 4.868 1.31e-06 ***
## Front.camera -93.05 32.32 -2.879 0.004068 **
## Resolution.y -59.17 70.86 -0.835 0.403904
## Resolution.x -95.34 58.50 -1.630 0.103459
## Brand -13.43 11.11 -1.209 0.226921
## Operating.system 472.07 50.12 9.419 < 2e-16 ***
## Internal.storage..GB.:RAM..MB. 1139.90 210.77 5.408 7.94e-08 ***
## Resolution.y:Resolution.x 626.66 100.38 6.243 6.31e-10 ***
## Brand:Operating.system -641.95 88.24 -7.275 6.96e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 87.1 on 1010 degrees of freedom
## Multiple R-squared: 0.59, Adjusted R-squared: 0.586
## F-statistic: 145.4 on 10 and 1010 DF, p-value: < 2.2e-16
cor(predict(multipleGoat, newdata = testing), testing$Price)^2 # 0.6151373
## [1] 0.6151373
We see, that in fact was the multiple model that we suggested. From the plot below we see an overall improvement.
par(mfrow=c(2,2))
plot(multipleGoat, pch = 23 ,bg='mediumpurple3', cex = 2)
Now we are going to continue with other statistical learning regression models. First we prepare the model we have selected
ctrl <- trainControl(method = "repeatedcv",
number = 5, repeats = 1)
model = Price ~ Internal.storage..GB.* RAM..MB. + Resolution.y *
Resolution.x + Front.camera +
Brand * Operating.system
linFit <- lm(model,
data=training)
#summary(linFit)
# to save all the predictors obtained:
test_results <- data.frame(price = testing$Price)
alm_tune <- train(model, data = training,
method = "lm",
preProc=c('scale', 'center'),
trControl = ctrl)
test_results$alm <- predict(alm_tune, testing)
postResample(pred = test_results$alm, obs = test_results$price)
## RMSE Rsquared MAE
## 127.9314108 0.6151373 59.1487843
Not a bad prediction but using it could be risky due to the excessive fitness to the training.
qplot(test_results$alm, test_results$price) +
labs(title="Linear Regression Observed VS Predicted", x="Predicted", y="Observed") +
geom_abline(intercept = 0, slope = 1, colour = "blue") +
theme_bw()
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
for_tune <- train(model, data = training,
method = "leapForward",
preProc=c('scale', 'center'),
tuneGrid = expand.grid(nvmax = 4:10),
trControl = ctrl)
for_tune
## Linear Regression with Forward Selection
##
## 1021 samples
## 7 predictor
##
## Pre-processing: scaled (10), centered (10)
## Resampling: Cross-Validated (5 fold, repeated 1 times)
## Summary of sample sizes: 818, 816, 816, 816, 818
## Resampling results across tuning parameters:
##
## nvmax RMSE Rsquared MAE
## 4 91.05288 0.5593826 51.00382
## 5 88.55225 0.5863609 49.71501
## 6 89.10925 0.5827839 50.06133
## 7 88.69971 0.5860553 50.15300
## 8 88.87251 0.5844075 50.36017
## 9 88.78076 0.5853693 50.25454
## 10 88.67707 0.5863621 50.19712
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was nvmax = 5.
plot(for_tune)
We can see that 5 or 6 Predictors obtained the lowest RMSE, so we will use 5 for our prediction.
coef(for_tune$finalModel, for_tune$bestTune$nvmax)
## (Intercept) RAM..MB.
## 124.17728 16.45963
## Operating.system Internal.storage..GB.:RAM..MB.
## 60.55494 47.06839
## Resolution.y:Resolution.x Brand:Operating.system
## 49.84204 -47.06419
# We use those variables for our prediction
test_results$frw <- predict(for_tune, testing)
postResample(pred = test_results$frw, obs = test_results$price)
## RMSE Rsquared MAE
## 128.9845147 0.6030451 60.3293853
qplot(test_results$frw, test_results$price) +
labs(title="Forward Regression Observed VS Predicted", x="Predicted", y="Observed") +
geom_abline(intercept = 0, slope = 1, colour = "blue") +
theme_bw()
But we see worse prediction results.
back_tune <- train(model, data = training,
method = "leapBackward",
preProc=c('scale', 'center'),
tuneGrid = expand.grid(nvmax = 4:10),
trControl = ctrl)
back_tune
## Linear Regression with Backwards Selection
##
## 1021 samples
## 7 predictor
##
## Pre-processing: scaled (10), centered (10)
## Resampling: Cross-Validated (5 fold, repeated 1 times)
## Summary of sample sizes: 817, 817, 817, 817, 816
## Resampling results across tuning parameters:
##
## nvmax RMSE Rsquared MAE
## 4 89.35453 0.5567866 49.48622
## 5 89.61526 0.5542371 49.83512
## 6 88.66141 0.5654053 49.73466
## 7 88.93745 0.5643857 50.05012
## 8 89.05671 0.5645377 50.06268
## 9 88.89159 0.5660702 50.12095
## 10 88.80189 0.5662878 49.97157
##
## RMSE was used to select the optimal model using the smallest value.
## The final value used for the model was nvmax = 6.
plot(back_tune)
Now we have as optimal nvmax = 6, almost 7 as before. We see again, that models with a large amount of variables are helping us to predict better the price.
coef(back_tune$finalModel, back_tune$bestTune$nvmax)
## (Intercept) RAM..MB.
## 124.17728 29.30339
## Front.camera Operating.system
## -15.23130 59.48330
## Internal.storage..GB.:RAM..MB. Resolution.y:Resolution.x
## 45.18856 49.27088
## Brand:Operating.system
## -46.40438
test_results$bw <- predict(back_tune, testing)
postResample(pred = test_results$bw, obs = test_results$price)
## RMSE Rsquared MAE
## 127.0414554 0.6181511 59.3245515
qplot(test_results$bw, test_results$price) +
labs(title="Backward Regression Observed VS Predicted", x="Predicted", y="Observed") +
geom_abline(intercept = 0, slope = 1, colour = "blue") +
theme_bw()
We see an improvement from our last best, now the R-squared is 0.618. Now this our best model, so far.
step_tune <- train(model, data = training,
method = "leapSeq",
preProc=c('scale', 'center'),
tuneGrid = expand.grid(nvmax = 4:10),
trControl = ctrl)
plot(step_tune)
# which variables are selected?
coef(step_tune$finalModel, step_tune$bestTune$nvmax)
## (Intercept) Operating.system
## 124.17728 60.09993
## Internal.storage..GB.:RAM..MB. Resolution.y:Resolution.x
## 57.39614 55.95360
## Brand:Operating.system
## -47.48937
test_results$seq <- predict(step_tune, testing)
postResample(pred = test_results$seq, obs = test_results$price)
## RMSE Rsquared MAE
## 126.6029866 0.6121991 59.1994080
qplot(test_results$seq, test_results$price) +
labs(title="Backward Regression Observed VS Predicted", x="Predicted", y="Observed") +
geom_abline(intercept = 0, slope = 1, colour = "blue") +
theme_bw()
Worse results.
Using glmnet we get:
# X matrix
X = model.matrix(model, data=training)
# y variable
y = training$Price
grid = seq(0, .1, length = 100) # a 100-size grid for lambda (rho in slides)
ridge.mod = glmnet(X, y, alpha=0, lambda=grid) # alpha=0 for ridge regression
#dim(coef(ridge.mod))
#coef(ridge.mod)
plot(ridge.mod, xvar="lambda")
ridge.cv = cv.glmnet(X, y, type.measure="mse", alpha=0)
plot(ridge.cv)
opt.lambda <- ridge.cv$lambda.min
opt.lambda # 8.96
## [1] 8.963584
lambda.index <- which(ridge.cv$lambda == ridge.cv$lambda.1se)
beta.ridge <- ridge.cv$glmnet.fit$beta[, lambda.index]
#beta.ridge
And the prediction obtained is
X.test = model.matrix(model, data=testing)
ridge.pred = predict(ridge.cv$glmnet.fit, s=opt.lambda, newx=X.test)
y.test = testing$Price
postResample(pred = ridge.pred, obs = y.test)
## RMSE Rsquared MAE
## 132.2043352 0.5934851 61.3673307
The R-squared obtained is almost 60% and the RMSE is 132. These values tell us that this prediction is not very good.
Therefore, we will try now using caret
ridge_grid <- expand.grid(lambda = seq(0, .1, length = 100))
ridge_tune <- train(model, data = training,
method='ridge',
preProc=c('scale','center'),
tuneGrid = ridge_grid,
trControl=ctrl)
plot(ridge_tune)
With this curve we can see that the optimal lambda is around 0.5.
# the best tune
ridge_tune$bestTune
# prediction
test_results$ridge <- predict(ridge_tune, testing)
postResample(pred = test_results$ridge, obs = test_results$price)
## RMSE Rsquared MAE
## 128.6794953 0.6089725 60.0268921
The results obtained are nearly the same than the glmnet ones.
lasso_grid <- expand.grid(fraction = seq(.01, 1, length = 100))
lasso_tune <- train(model, data = training,
method='lasso',
preProc=c('scale','center'),
tuneGrid = lasso_grid,
trControl=ctrl)
plot(lasso_tune)
lasso_tune$bestTune
test_results$lasso <- predict(lasso_tune, testing)
postResample(pred = test_results$lasso, obs = test_results$price)
## RMSE Rsquared MAE
## 130.259396 0.603403 60.045303
Again, the R-squared is 60% and 129 RMSE.
elastic_grid = expand.grid(alpha = seq(0, .2, 0.01), lambda = seq(0, .1, 0.01))
glmnet_tune <- train(model, data = training,
method='glmnet',
preProc=c('scale','center'),
tuneGrid = elastic_grid,
trControl=ctrl)
plot(glmnet_tune)
glmnet_tune$bestTune
test_results$glmnet <- predict(glmnet_tune, testing)
postResample(pred = test_results$glmnet, obs = test_results$price)
## RMSE Rsquared MAE
## 128.3381415 0.6137093 59.4068604
No improvement in the prediction.
Now, we will try machine learning models to see if we can improve our last best model (Backward Regression with R-squared = 0.618).
knn_tune <- train(model,
data = training,
method = "kknn",
preProc=c('scale','center'),
tuneGrid = data.frame(kmax=c(11,13,15,19,21),
distance=2 ,
kernel='optimal'),
trControl = ctrl)
plot(knn_tune)
test_results$knn <- predict(knn_tune, testing)
postResample(pred = test_results$knn, obs = test_results$price)
## RMSE Rsquared MAE
## 135.4068769 0.5860647 57.7535950
Worse that the statistical learning ones (R-squared = 0.58 and RMSE = 135).
rf_tune <- train(model,
data = training,
method = "rf",
preProc=c('scale','center'),
trControl = ctrl,
ntree = 100,
tuneGrid = data.frame(mtry=c(1,3,5,7)),
importance = TRUE)
plot(rf_tune)
test_results$rf <- predict(rf_tune, testing)
postResample(pred = test_results$rf, obs = test_results$price)
## RMSE Rsquared MAE
## 132.0761717 0.5894287 54.5428281
No improvement, the R-squared is 0.59 and the RMSE is 132.
xgb_tune <- train(model,
data = training,
method = "xgbTree",
preProc=c('scale','center'),
objective="reg:squarederror",
trControl = ctrl,
tuneGrid = expand.grid(nrounds = c(500,1000),
max_depth = c(5,6,7),
eta = c(0.01, 0.1, 1),
gamma = c(1, 2, 3),
colsample_bytree = c(1, 2),
min_child_weight = c(1),
subsample = c(0.2,0.5,0.8)))
## [09:47:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:47:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:48:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:49:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:50:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:51:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:52:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:53:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:54:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:55:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:56:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:57:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:58:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [09:59:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:00:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:02] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:53] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:01:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:02:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:03:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:04:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:05:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:41] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:06:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:07:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:08:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:09:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:21] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:10:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:11:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:12:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:28] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:13:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:47] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:14:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:19] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:45] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:50] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:15:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:12] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:33] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:16:54] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:00] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:07] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:17:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:01] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:06] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:16] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:27] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:32] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:18:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:11] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:18] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:24] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:42] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:51] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:55] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:19:59] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:23] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:29] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:34] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:40] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:20:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:08] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:15] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:36] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:44] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:21:58] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:05] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:13] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:22] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:30] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:35] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:39] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:48] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:52] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:22:57] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:03] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:09] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:14] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:20] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:26] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:37] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:43] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:49] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:23:56] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:04] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:10] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:17] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:25] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:31] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:38] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
## [10:24:46] WARNING: src/c_api/c_api.cc:935: `ntree_limit` is deprecated, use `iteration_range` instead.
test_results$xgb <- predict(xgb_tune, testing)
postResample(pred = test_results$xgb, obs = test_results$price)
## RMSE Rsquared MAE
## 135.4803666 0.5909701 52.6069315
The R-squared obtained is 62% and the RMSE is 127. This is the best regression model but very time-consuming.
apply(test_results[-1], 2, function(x) mean(abs(x - test_results$price)))
## alm frw bw seq ridge lasso glmnet knn
## 59.14878 60.32939 59.32455 59.19941 60.02689 60.04530 59.40686 57.75360
## rf xgb
## 54.54283 52.60693
# Combination
test_results$comb = (test_results$xgb + test_results$bw)/2
postResample(pred = test_results$comb, obs = test_results$price)
## RMSE Rsquared MAE
## 127.1561924 0.6565751 54.3606503
We obtained the best best model by combining the overfitted, the knn and the random forest models. In this way we obtained the best outcome yet, which is:
RMSE of 127
R-squared = 0.656
MAE = 54
Therefore, for the final prediction we are going to use the ensembled regression.
yhat = test_results$comb
head(yhat)
## [1] 526.5573 681.3712 174.0890 515.7914 359.2260 459.1025
hist(yhat, col="lightblue")
y = test_results$price
error = y-yhat
hist(error, col="lightblue")
noise = error[1:100]
# 90% confidence
lwr = yhat[101:length(yhat)] + quantile(noise,0.05, na.rm=T)
upr = yhat[101:length(yhat)] + quantile(noise,0.95, na.rm=T)
predictions = data.frame(real=y[101:length(y)],
fit=yhat[101:length(yhat)],
lwr=lwr,
upr=upr)
predictions = predictions %>% mutate(out=factor(if_else(real<lwr | real>upr,1,0)))
# how many real observations are out of the intervals?
mean(predictions$out==1)
## [1] 0.05462185
ggplot(predictions, aes(x=fit, y=real))+
geom_point(aes(color=out)) + theme(legend.position="none") +
geom_ribbon(data=predictions,aes(ymin=lwr,ymax=upr),alpha=0.3) +
labs(title = "Prediction intervals", x = "prediction",y="real price")
We can see that only a 5% of the predictions are far away from the original value. This means that the regression model works fine.
We saw a good behaviour of models with 6 or 7 variables, which makes sense, as there were around 2 or 3 pairs of variables that were fundamental in describing the price.
In the end, we saw that the best was to use a combination of:
A sequential model, Gradient boosting, as it improves model after model. This machine learning method is prone to be affected by overfitting. But thanks to the featuring engineering, not only it did not give us problems but it improved the overall model.
Backwards regression, this method depended more on our understanding of the variables and the concepts behind smartphone’s pricing. We saw a few pair of variables that helped us create a model almost as good as machine learning model.
To sum up, combining both statistical and machine learning procedures, along with a proper handling of the data, made us build a model that almost 70% of the time is going to predict right the price of the smartphone.